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Abstract 



Effecting coordination across remote sites in a distributed system is an es- 
sential part of distributed computing, and also an inherent challenge. Bereft 
of telepathy and other extrasensory perceptional powers, the processes must 
rely on message passing in order to achieve it. 

In 1978, a fascinating analysis of communication in asynchronous sys- 
tems was suggested by Leslie Lamport [26] . Lamport takes his cue from the 
theory of special relativity, where the bounded expansion of light through 
space and time marks the limits of causal affectability: nothing can travel 
faster than light, and so causal influence too must be limited by the speed 
of light. Of course, in typical distributed systems nothing as exotic as trav- 
eling at near the speed of light ever comes up. But here, in analogy to light, 
causal influence cannot travel faster than the messages that traverse the 
inter-process void do. The import of Lamport's paper for distributed com- 
puting cannot be over estimated. The causal analysis determines a notion 
of temporal precedence, a sort of weak notion of time, which is otherwise 
missing in asynchronous systems. This notion has been extensively utilized 
in various applications. 

Yet Lamport's analysis, and the reliant body of research that has been 
conducted since, is mostly limited to systems that are asynchronous. In this 
thesis we go beyond the existing body of literature by investigating causality 
in synchronous systems. In such systems, the boundries of causal influence 
are not charted out exclusively by message passing. Here time itself, passing 
at a uniform (or almost uniform) rate for all processes, is also a medium by 
which causal influence may fan out. This thesis studies, and characterizes, 
the intricate combinations of time and message passing that govern causal 
influence in synchronous systems. 

It turns out that knowledge based analysis [15] provides a well tailored 
formal framework within which causal notions can be studied. As we show, 
the formal notion of knowledge is highly appropriate for characterizing causal 
influence in terms of information flow. The idea of using knowledge in such 
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circumstance was first brought up by Chandy and Misra in [7] . We broaden 
their analysis and deepen its methodological infrastructure. 

In order to study coordination rigorously, we define several generic classes 
of coordination problems that pose various temporal ordering requirements 
on the participating processes. These coordination problems provide nat- 
ural generalizations of real life requirements. We then analyze the causal 
conditions that underly suitable solutions to these problems. The analysis 
is conducted in two stages: first, the temporal ordering requirements are 
reduced to epistemic conditions. Then, these epistemic conditions are char- 
acterized in terms of the causal communication patterns that are necessary 
and sufficient to bring them about. 

Whilst in asynchronous systems causal infiuence is characterized by a 
straightforward application of the temporal precedence order defined by 
Lamport, in synchronous systems the causal communication patterns are 
more complex. We identify several such patterns, each of them being a 
minimal requirement in some class of coordination problems: we start with 
syncausality, an immediate generalization of Lamport's ordering, and move 
on to centipedes and centibrooms, structures that combine message passing 
and timing constraints. These latter two are shown to be special cases of the 
generalized centipede. These patterns lead us up in an increasingly complex 
hierarchy of ordering requirements, culminating in a characterization of the 
minimal communication pattern that is necessary to ensure any specification 
given as a partial ordering on the temporal precedence of events. 
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Chapter 1 

Introduction 



1.1 Causal Analysis in Distributed Systems 

In distributed systems, a group of autonomous processes with limited means 
of communication are typically set to cooperate and coordinate their local 
actions in order to achieve a system- wide global requirement. In general, 
the less processes know of actions and of occurrences at remote sites, the 
more difficult the task of coordination becomes. 

Of particular difficulty is achieving coordination in asynchronous sys- 
tems, where no guarantees are given regarding the rate at which each process 
proceeds, and message delivery can be indefinitely postponed. In a semi- 
nal paper [26], Lamport proposed the happened-before relation between 
events in asynchronous systems, and based on this relation a mechanism for 
logical clocks that allow processes to exercise some control over the ordering 
of events. 

Lamport takes his lead from the theory of special relativity, where the 
way by which light dissipates in space over time determines upper bounds 
on the spread of information and of causality in general, as nothing can 
get from source to target faster than light itself. Applying this analogy to 
distributed systems, Lamport notes that in asynchronous systems, causality 
and information cannot travel faster than the messages that are sent and 
received between the processes. This suggests that the following relation on 
events applies to e and e' whenever the occurrence of e' is causally dependent 
upon occurrence of e.^ 

Definition 1 (Happened-before) Fix an execution r of the system. The 

^Thc formulation differs slightly from that of [26] as we do not impose irreflexivity, for 
the sake of a simpler formulation. 
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happened-before relation over events of r is the smallest relation satis- 
fying the following conditions: 

1. If e and e' are events in the same process, and e comes before or with 
e' , then e ^ e' . 

2. If e is the sending of a message by one process and e' is the receipt of 
the same message by another process, then e ^ e' . 

3. If e ^ e" and e" e' then e ^ e' . 



As messages are never delivered before they are sent, happened-before 
implies that whenever e ^ e' , the occurrence of event e temporally precedes 
(or is simultaneous with) the occurrence of e' , even if the events occur at 
distinct sites. Thus, a partial ordering that is implicit in every execution of 
a distributed system is made explicit. 

The event ordering that is determined by the happened-before relation 
is sometimes referred to as a causal ordering. Causality is an elusive con- 
cept whose nature has been widely contested over the centuries. Lamport's 
relation circumvents these sticky philosophical issues, in the following sense. 
Whenever events e,e' do occur in an execution and e -/» e' , then event e 
cannot be a cause of event e' under any interpretation of causality, as its 
effect has not reached the site of e' by the time it occurs. Note that, strictly 
speaking, the converse scenario where e ^ e' does hold can only mean that 
e is a potential cause of e' , as the occurrence of e' may have been nondeter- 
ministic, or based on the occurrence of events other than e. 

Lamport offers an immediate application for causal ordering. Logical 
clocks are defined as local counters {Clocki for each process i), that assign 
a number to each local event. By timcstamping each message sent with the 
current value of the sender's counter, a simple mechanism is devised to make 
sure for all events e, e' occurring at sites i,j respectively, that Clocki{e) < 
Clock j{e') whenever e ^ e' and i ^ j. 

The immense import of Lamport's paper on the development of theoret- 
ical and practical distributed systems cannot be overestimated. For us it is 
important to summarize by saying that Lamport defined a relation, based 
on communication patterns (the inter-process message chains that establish 
the -»■ relation) , that traces the dissemination of causal effect in a system. 
Moreover, he showed how this causal ordering can be used to establish a 
temporal ordering on events. 
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Chandy and Misra's follow up paper [7] explicitly relates Lamport's re- 
lation to knowledge. This paper offers a reinterpretation of Lamport's ideas 
in terms of knowledge, rather than of coordination. A more in-depth cover- 
age of knowledge in distributed systems is offered in Section 1.3, while the 
current discussion will be kept to an intuitive level. In distributed systems 
each process is immediately acquainted only with its own local state. Thus, 
facts that pertain to local states of remote sites may be hidden from it. Now 
consider a run (or execution) of the system where, at the current time, pro- 
cess z's local state is ^. If, when one looks at all possible runs, an arbitrary 
fact (f holds true of the system whenever i's local state is I, then i is said to 
"know" that ip: there is simply no way that, given its local state £, fact ip 
could fail to hold. How would process i come to know that, say, the value 
of process fs local variable Xj is 10? 

A simple answer can be given if Xj = 10 is an invariant specified by the 
protocol. To filter away such "uninteresting" cases, what if at time t process 
j itself does not know that Xj = 10, and at time t' > t process i knows that 
process j knows that Xj = 10?^ Chandy and Misra call such a development 
knowledge gain, where process i comes to gain new knowledge about the 
state of process j. They surmise, and then prove, that in such a case it 
must be that process j at time t is happcncd-bcfore related to process i at 
time t' . We will denote such a relation with {j,t) -» {i,t'). More generally, 
Chandy and Misra show that if at time t' process knows that process 
ik-i knows that... process n knows that process j knows that Xj = 10, 
then it must be that there arc times t = Iq < t\ < ■ ■ ■ < = t' such that 
(j) *o) {h,ti) ^ ■■■ ^ (ifc, tk). 

While Lamport relates communication to coordination, Chandy and 
Misra relate it to knowledge gain. In both cases the happened-before re- 
lation can be seen to give as good a characterization as can be achieved of 
the spread of causal effect in the system. However, once we have formalized 
the notions of knowledge and of coordination with which the thesis deals, 
we will show in Section 2.3 that knowledge gain is a necessary condition for 
coordination, and thus provides a "closer to home" approximation of causal- 
ity than coordination. As such, we will study it extensively in the thesis, 
with the aim of giving a precise understanding of causality in synchronous 
systems. 



•^Sinco Xj is a part of j's local state, by definition process j will know its value at all 
times, so if j doesn't know that Xj = 10, it must be that Xj ^ 10. 
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1.2 Causality, Knowledge, Coordination 



Roughly sketched, the scenery drawn out by Lamport and by Chandy and 
Misra for asynchronous systems shows that communication is prerequisite 
for knowledge gain and that, similarly, knowledge gain is necessary for the 
coordinated ordering of events. These relations justify associating Lamport's 
happened-before relation with causality in such settings. 

In this thesis we will investigate causality as it manifests itself in syn- 
chronous settings. Example 1, presented in the next chapter, will show us 
that the happened-before relation no longer characterizes causal relations 
in their entirety under synchrony. Our main goals will be to identify the 
communication patterns that do characterize causality here. 

Our method is to define various scenarios where knowledge gain, as a rig- 
orously defined approximate for causality, takes place. Each of the following 
chapters is dedicated to such a scenario. In Chapters 3 and 4 we provide the 
Ordered and Simultaneous Response problems as motivating leads. Given 
the necessity of nested and common knowledge gain for the OR and SR prob- 
lems respectively, characterizing solutions to these coordination problems in 
terms of causality pretty much reduces to an analysis of knowledge gain in 
such terms. 

The study of causal relations leading to knowledge gain is thus relevant 
in the context two differing research programmes: 

• Focussing on the relations between knowledge gain and causality, we 
hope to make the thesis results instrumental in the widely defined 
field of epistemic analysis in multi agent systems. The thesis results 
may be applicable in the linguistic study [28], as well as in game the- 
oretic analysis of interactive epistemics [2, 8], and possibly also in the 
philosophical analysis of causality [44, 51]. 

• By encompassing also the relations between knowledge and coordina- 
tion, we relate coordination directly to communication. Unlike knowl- 
edge, coordination and communication are both tangible, and results 
characterizing one in terms of the other would be easier to apply. 

Thus, even in the context of more applicative study of distributed 
systems, knowledge based analysis can be made to play a subtle, if 
highly beneficial, role. Knowledge is a powerful tool for extracting 
underlying generalizations in such systems and is our best approxi- 
mation for causal phenomena. Once these generalizations have been 
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properly characterized, direct connections between them can be drawn 
out, pretty much obsoleting the interpretive epistemic layer. 

One final guiding principal for the inquiries made in this thesis needs to 
be mentioned. It is widely understood that different process protocols lead 
to widely varying characteristics for the system as a whole. Nevertheless, our 
key results are not protocol dependent, and in this sense they characterize 
all synchronous systems. We adhere to the idea of characterizing systems 
rather than protocols throughout the thesis, and even where protocol specific 
results are given, they bear significance for all systems (by showing that 
our definitions are tight). The one exception to this guideline is made in 
Chapter 7, where gaining knowledge of ignorance is discussed. 

1.3 The Interpreted Systems Framework 

Results pertaining to knowledge gain in distributed systems provide the 
main formal backbone of this thesis. For this reason we utilize the interpreted 
systems framework of Fagin, Halpern, Moses, and Vardi [15]. We shall 
simplify its exposition somewhat here, and review just enough of the details 
to support the formal analysis. Essentially all of the definitions in this 
section are taken from [15]. 

Informally, we view a multi-process system as consisting of a set P = 
{1, . . . , n} of processes connected by a communication network. We assume 
that, at any given point in time, each process in the system is in some local 
state. A global state is just a tuple g = {£e,ii, ■ ■ ■ ,in) consisting of local 
states of the processes, together with the state if, of the environment. The 
environment's state accounts for everything that is relevant to the system 
that is not contained in the state of the processes. 

A run is a function from time to global states. Intuitively, a run is a 
complete description of what happens over time in one possible execution 
of the system. A point is a pair (r, t) consisting of a run r and a time t. If 
r{t) = {£e,£i, . . . , in), then we use ri{t) to denote process i's local state 4 at 
the point (r, t), for z = 1, . . . , n, and re(t) to denote ig. For simplicity, time 
here is taken to range over the natural numbers rather than the reals (so 
that time is viewed as discrete, rather than dense or continuous). Round t 
in run r occurs between time t—1 and t. 

We identify a protocol for a process i with a function from local states of i 
to nonempty sets of actions. (We mostly consider deterministic protocols, 
in which each local state is mapped to a singleton set of actions. Such a 
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protocol essentially maps local states to actions.) A joint protocol is just a 
sequence of protocols P = {Pi, . . . , P„), one for each process. 

We generally study knowledge in runs of a given protocol P in a par- 
ticular setting of interest. To do this, we separately describe the setting, 
or context, in which P is being executed. Formally, a context 7 is a tuple 
(Qo, Pe,T), where Qo is a set of initial global states, Pe is a protocol for the 
environment, and r is a transition function.'^ The environment is viewed 
as running a protocol (denoted by Pg) just like the processes; its protocol 
is used to capture nondeterministic aspects of the execution, such as the 
actual transmission times, external inputs into the system, etc. The tran- 
sition function r describes how the actions performed by the processes and 
the environment change the global state. Thus, if y is a global state and 
a = (ae, ai, . . . , a„) is a joint action (consisting of an action for the environ- 
ment and one for each of the processes), then r(a, (/) = g' specifies that g' 
is the state that results when a is performed in state g. When modeling 
asynchronous systems, we assume that some processes will be executing a 
NU LL action, of which they are not even aware (their local states are left 
unaltered) . 

A run r is consistent with a protocol P if it could have been generated 
when running protocol P. Formally, run r is consistent with joint protocol 
P in context 7 if 

1. r(0) G Qo, so that it starts from a 7-legal initial global state, and 

2. for all t > 0, the transition from global state r{t) to r(t + 1) is the 
result of performing one of the joint actions specified by P and the 
environment protocol Pg (the latter is specified in 7) in the global 
state r(m). That is, if P = (Pi,...,P„), Pg is the environment's 
protocol in context 7, and r(m) = {ie,ii, ■ ■ ■ ,in), then there must be 
a joint action a = (ag, ai, . . . , a„) such that G Pe{ie), aj G Pi{ii) for 
i = 1, . . . ,n, and r(m + 1) = r(a, r(m)) (so that r(m + 1) is the result 
of applying the joint action a to r{m)). 

We use TZ(P, 7) to denote the set of all runs of P in 7, and call it the system 
representing P in context 7. 

A description of the specific context 7*^ that we deal with throughout 
the thesis is found in Section 1.5. 

^Depending on the application, a context can include additional components, to ac- 
count for fairness assumptions, probabilistic assumptions, etc. Moreover, additional as- 
pects of a context that are usually suppressed from the notation are nonempty sets Int 
and Ext of internal actions for the processes and external inputs, respectively. 
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1.4 Defining Knowledge in a Distributed System 



We aim at a logical analysis of gained knowledge regarding the occurrence 
of events. The interpreted systems framework [15] provides us with much 
of the necessary machinery here. We focus on a simple logical language 
in which the set # of primitive propositions consists of propositions of the 
form occurred(e), ND(e) and time = t for all events e and times t. To 
obtain the logical language £, we close $ under prepositional connectives 
and knowledge formulas. Thus, $ C £, and ii ip e JC, i E F and G C P, then 
{Kiip, Ecf, Ccf} C C The formula Kicp is read process i knows (/?, Ec^p 
is read everyone in G knows ^p, and CgP> is read </? is common knowledge 
to G. In addition, we add a timestamping operator as well. Thus, if 99 G £ 
and t G N, then Mfip G CJ^ 

The truth of a formula is defined with respect to a triple {R,r,t). We 
write {R, r,t) (p to state that ^p holds at time t in run r, with respect to 
system R. It is always assumed that r G i? in a triple {R,r,t). The precise 
meaning of nondeterminism in this system is given in Section 1.5 below. 
Denoting by ri{t) process i's local state at time t in r, we inductively define 

{R,r,t) 1= ND(e) iff event e occurs in r and is nondeterministic 
there; 

{R,r,t) N occurred (e) iff event e occurs at time t' in r such that 
t' < t; 

{R,r,t)^ time = t' iff t' = t; 

{R, r, t) N Atfip iff {R, r, t') \= ip; 

{R, r, t) N Ki(p iff {R, r', f) ^ ip for every run r' satisfying 

r^it) = riit'); 

{R, r, t) N Eg<p iff {R, r, t) N Kiip for every i e G; and 
{R, r, t) 1= Cg^ iff {R, r, t) 1= {Eg)''(P for every k>l. 

Prepositional connectives are handled in the standard way, and their clauses 

are omitted above. In some cases it will be convenient to also syntactically 
derive the proposition {R,r,t) N occurs(e) iff {R,r,t) N occurred(e) A(time = 

''in this thesis we do not investigate complexity and decidability issues pertaining to the 
use of explicitly timestamped formulas. The system's existing constraints on transmission 
times require some sort of temporal metric on formulas, and we opt for this choice based 
on the clarity and conciseness that it offers. 
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V Att_i-ioccurred(e)), so {R,r,t) 1= occurs(e) holds true if e occurs exactly 
at time t in r. 

By definition, Ki(p is satisfied at r E R and time t if cp holds at all 
points at which i has the same local state as in {r,t). Thus, given R, the 
local state determines what processes know. Intuitively, a fact if is common 
knowledge to G if everyone in G knows cp, everyone knows that everyone 
knows (p, and so on ad infinitum. In particular, if (R, r, t) \= Cq^ then 
{R, r, t) \= Kif^Ki^_-^ ■ ■ ■ Ki^cp, for every string Ki^Ki^_^ ■ ■ ■ Ki^ and h > 0. 

We write R\= (p and say that 'V is valid in R" if (R, r, i) N </? holds for 
all r G i? and t > 0. A formula is valid, written 1= if it is valid in all 
systems R. 

It is convenient to treat boundary cases for some of the operators in 
the following way: for |G| = we have R \= Ecip for all (p, and hence also 
R N Cgv?- For ah G, we say that (R, r, t) ^ {Eg)^lp iff {R, r, t) ^ tp. 

In the context of distributed systems, the knowledge operator embodies 
an important function that is often left unstated. Intuitively, a process 
"knows" ipi if it is in possession of ample evidence that (p is true. Essentially, 
all such evidence must be based on the local state of the process. The local 
states of other processes are not immediately available for it to inspect. We 
consider the local state I of process i as "ample evidence that y?" , if at every 
possible point in the execution of the distributed system where the local 
state of i is ^, 99 holds. Thus knowledge can be seen as a localizing qualifier 
for the inner formula (p. To see this, consider the two statements below. 

• (i?, r, t) 1= ip, and 

• {R,r,t)^Knp 

The former statement is straightforward: if obtains at the distributed 

system in run r at time t. The latter statement makes a stronger claim: 
not only does (p obtain in r at i, but it also holds in every possible point in 
which Vs local state is the same as it is now (at the point (r, t)). 

Formulas pertaining to nested knowledge, such as (i?, r, t) 1= KjKiip, can 
now also be given a formal interpretation. What it means is that process 
j's local state at {r,t) provides ample evidence to support the claim that 
process i's local state at (r, t) provides ample evidence that ip is true at (r, t). 
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1.5 The Synchronous Model 



1.5.1 The Synchronous Context 7*^ 

We are interested in characterizing the effect of synchronous constraints on 
distributed systems. We therefore define a class of synchronous contexts 
that ensure the following properties for all systems defined on top of them: 

• The set of processes is denoted by P. These are connected by a net- 
work of weighted channels. For each pair of processes i,j connected 
by a communication channel, the weights miriij and maxij denote the 
minimal and maximal transmission times for messages over the chan- 
nel, respectively. In all cases miriij ^ 1- Whenever there is no upper 
bound on transmission we have that maxij = 00. 

• We assume that processes can receive external inputs from the outside 
world. These are determined in a genuinely nondeterministic fashion, 
and are not correlated with anything that comes before in the run, or 
with external inputs currently received by other processes. 

• The scheduler, which we typically call the environment, is in charge of 
choosing the external inputs, and of determining message transmission 
times. The latter are also determined in a nondeterministic fashion, 
subject to the delivery time constraints as detailed by the weights on 
the channels. 

• Time is identified with the natural numbers N, and each process is as- 
sumed to take a step at each time t G N. For simplicity, the processes 
follow deterministic protocols. Hence, a given protocol P for the pro- 
cesses and a given behavior of the environment completely determine 
the run. 

• Events are sends, receives, external inputs and internal actions. All 
events in a run are distinct, and we denote a generic event by the 
letter e. For simplicity, events do not take time to be performed. At 
a given time point a process can perform an arbitrary finite set of 
actions. 

We shall, for the most part, be concerned with contexts that are more 
restrictive than 7*^. Thus, any 7*^ context where 1 = minij < maxij < 00 for 
all channels {i,j) will be called a 7"^^'' context , it is a context whose systems 
are characterized by the existence of finite upper bounds on delivery time. 
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Similarly, a 7 context where 1 < miriij < maxij = 00 for every channel 
is a 7"^'" context. This is a context where there are only lower bounds 
on transmission times. 

Whichever superscript a is used to denote a particular context (as in 
7"), will also be made use of to denote a system 7?." = Tl{P, 7") where P is 
any arbitrary protocol. 

1.5.2 Detailed Specification for 7*^ 

A synchronous context is defined as a tuple {Qo,P^,t) where 

The environment's state Recall that the environment's state keeps 
track of relevant aspects of the global state that are not represented in 
the local states of the processes. We assume that the environment's state 
has three components ig = (Net, t, Hist g), where 

1. Net is a labelled graph (P, E, max, min) describing the network topol- 
ogy and bounds on transmission times. Its nodes are processes, and a 
directed edge (i, j) G E captures the fact that there is a channel from i 
to j in the system. Moreover, the labels 1 < maxij G N U {00} and 
1 < miriij G N are upper and lower bounds respectively on the time 
that a message sent on can be in transit. The contents of Net are 
not affected by r, and so Net remains constant throughout the run. 

2. The variable t keeps track of global time. As we shall see its value 
starts at t = 0, and advances by 1 following each round. Finally, 

3. Histe records the sequence of joint actions performed so far. The Histe 
component uniquely determines the contents of all channels.^ Indeed, 

a message fi is in transit at a given global state g if the Histe component 
in g records that /x has been sent, and does not record its delivery. 

Process local states We assume that local states have three components 
£i = (Neti, ti, data,), where Netj and are copies of the Net and t values from 

the environment's state.. The component dataj serves as the data segment 
for the process i. Its contents are a function of the protocol P and the 
transition function r. 

^This holds true even if we allow message loss by setting bij = 00. 
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The set Qo of initial global states We assume that associated with 7° 

there is a set Initj of possible initial states for each process i €¥. Wc define 
Go to be the set of global states gl = {ie,f-i, ■ ■ ■ J-n) satisfying: (1) tj = 
for all i G P and t = 0; (2) the network components Net and Netj are all 
identical; (3) for every z G P, histi = (initj),with initj G Init^; and (4) Histe 
is the empty sequence. 

Actions and external inputs Associated with the context 7*^ are sets 
Int of internal actions for the processes and sets Ext of external inputs, 
respectively. For ease of exposition we assume that _L G Ext, where _L 
stands for the empty external input. Moreover, we generally assume that 
Ext 7^ {-L}, so that there is at least one nontrivial possible external input. 
We assume that processes can perform send actions and internal actions. 
The local action Ziik) that i contributes to the joint action in round k + 1 
consists of a finite sequence of distinct send and internal actions. (Recall 
that the local action is determined by the protocol, based on the local state.) 
We use external inputs to model spontaneous events. They are generated 
by the environment. In addition to external inputs, the environment is in 
charge of message delivery. Thus, the environment's action ae(A;) consists 
of a finite sequence of external inputs to be delivered to various individual 
processes, a subset tt of P that are activated in the current round, and a 
(possibly empty) set of messages that are to be delivered in the current 
round. 

The environment's protocol The environment in 7*^ is in charge of 

delivering external inputs to processes and determining message deliveries. 
For every global state gl we define P^igl) to be the set of actions ag = 
{o'x, o'd) such that 

1. (Ta; : P — >■ Ext is a sequence assigning to each process i G P an external 
input (possibly the empty input _L) it receives in the current round, 
and 

2. ad is a sequence (Mi...M|p|) where (i) for every i G Proc the set 
Mj consists of messages that are in transit in gl, (ii) Mj contains all 
messages in transit to i whose transmission time bounds, as specified 
in Net, will be violated (expire) if the message is not delivered in the 
current round, and (iii) none of the messages in Mj are such that 
if delivered in the current round, will violate the existing minimal 
transmission time constraints. 
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Notice that Pg is genuinely nondeterministic. Exactly one of the actions 
in P^{gl) will be performed in global state gl in any given instance. By 
definition of TZ{P,^^), however, if r{k) = gl then the system contains a run 
extending the prefix r(0), . . . , r{k) for every possible environment action in 
P^{gl). Another point to note is that our definition does not enforce (and 
hence does not assume) FIFO transmission; had we done so, channels would 
be considered to be queues, and the nondeterministic choices of messages 
to deliver would have to obey FIFO order. It should also be noted that the 
scheduler makes sure to comply with all existing constraints: minimal and 
maximal transmission times, as well as process rate. Finally, the fact that 
external inputs are delivered in a nondeterministic fashion implies they are 
not correlated in any way, and they do not depend on anything that happens 
before they are delivered. This is the sense in which external inputs can be 
viewed as independent, "spontaneous" events. 

The transition function r The transition function r implements the 
joint actions in a rather straightforward manner. In every round: (i) the 
global clock variable t and the local variables tj of alH G tt are advanced by 
one; (ii) a copy of the joint action is added to the environment's history log 
Histe', and (iii) For every process i, a record of all current round message 
deliveries and external inputs to the process is written in data^. Note that 
this record is overwritten in every round, so that a protocol must take special 
measures in order to maintain a persistent copy of these contents. 

1.6 Road Map 

This chapter has provided an outline of the necessary background upon 
which it is built: the causal analysis of distributed systems introduced by 
Lamport, and the knowledge-based framework of Fagin et. al. The rest 
of the thesis describes novel results obtained as part of our research. We 
conclude it with a roadmap that offers a general outline of what the thesis 
is all about. 

Chapter 2 defines the formal and conceptual "playground" within which 
our research is conducted. We start by introducing the Ordered and the 
Simultaneous Response problems: two generic coordination problems that 
set constraints on the temporal ordering of events. The problem definitions 
involve a set of required responses to a spontaneous non-deterministic event. 
As we argue there, spontaneity is a required ingredient if we want to study 
those cases of coordination that necessitate information flow in the system. 
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This rather abstract notion of information flow is also given a formal inter- 
pretation, in terms of knowledge gain. Apart from definitions, the chapter 
also provides initial claims and their proofs. We study and prove the rela- 
tions between the types of response problems and correlated epistemic states 
of gained knowledge: nested knowledge is necessary and sufficient for ensur- 
ing correct solutions to the Ordered Response problem. Similarly, common 
knowledge is necessary and sufficient for Simultaneous Response. A discus- 
sion on the role of knowledge as an intermediate layer between causality and 
coordination concludes this chapter. 

Chapter 3 is the first in a series of four chapters that each study and 
characterize a particular coordination problem. This chapter studies the 
Ordered Response problem, but it also introduces several key notions that 
are utilized in the following chapters. We use the set of process-time pairs 
as the domain in which causal relations are defined. This domain is more 
suitable than the set of events, or of processes, given the synchronous char- 
acteristics of the system. The two most basic causal relations that we use 
are timing guarantees and syncausality, the latter being a generalization of 
Lamport's happened-before. 

In asynchronous systems, the correct ordering of more than two events 
requires the repeated application of the happened-before relation to each 
pair of subsequent events. A careful analysis of solutions to the most simple 
cases of Ordered Response reveals that sTich ordering in synchronous systems 
requires complex relations between all of the related process-time nodes. We 
define a causal structure, the centipede, that combines both syncausality 
and timing guarantees, and prove the Centipede Theorem, showing that the 
existence of a centipede is necessary for ensuring the correct ordering of a 
sequence of events. For ease of exposition, the formal results of this chapter 
(as well as those of Chapters 4 through 6) are given using the 7'"^^ context, 
in which only upper bounds are defined. We show however that the theorem 
also applies in two complementing boundary cases: one, where there are no 
upper bounds on delivery, and the other where delivery times are fixed. In 
the former case, the centipede structure is trivialized into a Lamport-style 
message chain. We conclude this chapter by showing that our definition is 
tight, in the sense that under some protocols the existence of the centipede 
is also sufficient for proper event ordering. We suggest the Full Information 
Protocol (fip) for synchronous systems for this purpose. 

Moving on to the next type of response problem. Chapter 4 investigates 
the causal structures necessary for ensuring the simultaneous happening 
of events. As such, it constitutes a complete break with existing analysis 
of asynchronous causality, where no such constraint can be ensured. We 
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introduce the centibroom structure, a variant of the centipede, and prove 
the Centibroom Theorem, an analog to the Centipede Theorem that shows 
the existence of a centibroom to be necessary for ensuring simultaneous 
actions. Sufficiency of the centibroom for such coordinated responses is also 
proved, under fip. As an application of the Centibroom Theorem we suggest 
two novel variants of a global snapshot algorithm for synchronous systems, 
one of which is shown to provide optimal time complexity. 

The particular form of the centibroom, and the results of Chapter 4 
showing that it is a prerequisite for common knowledge, provide a clear 
and graphic demonstration that the nature of common knowledge is fini- 
tistic, despite its familiar definition being based on an infinite conjunction 
of facts. Further investigation into the properties of the centibroom and of 
common knowledge is used to show that, roughly speaking, it takes time to 
obtain deeply nested knowledge without "collapsing" into common knowl- 
edge. Nevertheless, this result is shown to be dependent upon the protocol 
being followed. A counterexample is suggested where every level of nested 
knowledge may be achieved without common knowledge ensuing. 

Chapters 5 and 6 deal with generalizations of both the Ordered Re- 
sponse and the Simultaneous Response problems. First, the Ordered Group 
Response problem is defined, which can be seen as an immediate "merge" of 
Ordered and Simultaneous Response requirements. In analogy, the general- 
ized centipede structure is defined, and is shown to be necessary in solutions 
to the Ordered Group Response problem. Then we take the generalization 
even further and define the Generalized Response Problem, where the re- 
quired temporal ordering of events can be specified using any partial order. 
Characterization is provided in terms of sets of generalized centipedes. Our 
understanding of common knowledge is advanced further by showing how 
such an cpistemic state is dependent upon the joint histories of the processes 
in the group. 

Chapter 7 takes a different stance from the one followed to this point. 
In epistemic terms, each of the response problems we considered thus far 
is reduced into a rather complex requirement concerning knowledge about 
knowledge, which is then reduced further into a causal communication condi- 
tion that is highly dependent upon the existence of upper bounds on message 
delivery times. In Chapter 7 we consider the complementary approach: we 
ask what is the causal condition that will ensure knowledge about ignorance 
rather than knowledge about knowledge. This leads us into a more detailed 
discussion of causal cones of infiuence, from which the conditions for such 
ignorance are then distilled. As it turns out, it is the existence of lower 
hounds on transmission times that makes such knowledge, which is of value 
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in competitive settings, possible. 

Chapter 8 brings the thesis to conclusion and discusses possible further 
research and various open questions. 

1.7 Related Work 

A great abundance of work pertaining to Lamport's happened-before rela- 
tion has been collected over the years, and we shall make no attempt to 
provide a survey of this work. A thorough report can be found in [48]. Two 
of the most widely known works that build atop it are Mattern's general- 
ization of Lamport's scalar clocks into vector clocks in [30, 17] and Chandy 
and Lamport's utilization in the snapshot algorithm [6] which is further dis- 
cussed in Chapter 4. Another related work is the Chandy and Misra paper 
[7] discussed in Section 1.1. 

Formal study of knowledge, and knowledge about knowledge, touches 
on many fields, ranging from philosophy [29] and psychology [9], to linguis- 
tics [20, 39], economics [2], AI [31], cryptography [12, 46, 19] and distributed 
systems [23, 7, 40]. The interpreted systems framework, epitomized in [15], 
stands at the base of related research in to distributed computing [36, 13, 24]. 

Explicit and implicit use of time bounds, introduced in Chapter 3, for 
coordination and improved efficiency is ubiquitous in distributed comput- 
ing. An elegant example of its use is made by Hadzilacos and Halpern in 
[21]. That knowledge can be gained by way of Null messages when timing 
guarantees are available has been part of the folklore for decades. To our 
knowledge, Lamport [27] was first to explicitly explore the use of Null mes- 
sages beyond their customary timeout semantics. In effect Lamport's state 
machine protocols in that paper are based on an implicit notion of causality 
which we will later (see Chapter 3) define as syncausality, yet no attempt 
at rigorous formalization is made there, and the general role of time bounds 
is not developed. A tutorial by Moses [34] suggests as a viable topic for 
future work performing an explicit analysis of the effect of Null messages 
on knowledge gain. He also presents an example in which communication 
can be saved by using timeouts. However, [34] does not suggest modify- 
ing Lamport causality to suit synchronous systems, and none of the new 
notions or technical results in this thesis were suggested in [34]. Krasucki 
and Ramanujam in [25] study of the interaction between knowledge and the 
ordering of events in a distributed system. They consider concurrency in 
a rather abstract setting, where they show that causality is related to the 
existence of particular partially ordered sets. They do not explicitly study 
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the synchronous model, however, and do not exphcitly consider synchronous 
time bounds on channels. Moses and Bloom [35] perform a knowledge-based 
analysis of clock synchronization in the presence of bounds on transmission 
times. They generalize Lamport's relation by defining a notion of timed 
causality e-^e' that corresponds to e taking place at least a time units 
before e' . It appears that is a quantitative generalization of Lamport 

causality for the purpose of determining relative timing of events. A similar 
notion appears in the work of Patt-Shamir and Rajsbaum [43]. 

Chapter 4 deals extensively with relations between nested knowledge, 
common knowledge and time. The growing body of research dealing with 
the dynamics of interactive epistemology has brought to light some of the 
intricate relations here [23, 15, 50]. Halpern and Moses [23] proved that com- 
mon knowledge cannot be gained in the face of unreliable or asynchronous 
communication. Parikh and Ramanujam [41] investigate nested knowledge 
in connection to formal languages. Common knowledge is typically perceived 
in terms of an infinite conjunction of E^, for A: > 0. There are also definitions 
of common knowledge in terms of a fixed point (see, e.g., [29, 15, 5]). Fischer 
and Immcrman [18] first showed that the level of nested knowledge that can 
be achieved without "collapsing" into common knowledge are bounded in 
finite state systems. The combinations of nested and common knowledge 
that are discussed in Chapter 5 are, interestingly, somewhat similar to those 
found in Chwe's [8] game theoretic analysis of coordination. 
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Chapter 2 



Response Problems and 
Knowledge in Synchronous 
Systems 

2.1 Studying Coordination via Response Problems 

Existing literature about causality in distributed systems deals almost exclu- 
sively with asynchronous systems. This is not really surprising, given that a 
major application for Lamport's happened-before is in providing some sort 
of synchronized layer on top of asynchronous communication networks, and 
that in synchronous systems such a layer is already provided for. 

Yet in distributed systems, being able to share a certain global sense 
of time is, in most cases, only a means to an end, the real purpose being 
coordinating events across remote sites. Is the coordination of events in a 
synchronous system as easy as looking at the clock? In some cases yes. Con- 
sider a simple system where Zoe and Xerxes operate based on a prearranged 
protocol that ensures that Zoe will pick Xerxes up for the movies at 7:4-5. 
Come 7:45, Xerxes looks at the clock, get's up and goes outside and into 
Zoe''s car that had just come by. 

At other times though, a global clock is not enough to ensure coordina- 
tion. If Zoe's arrival hinges upon her getting through all work meetings by 
6:50 (an unpredictable occurrence), then Xerxes may find himself alone at 
7:45. In synchronous systems that allow for nondeterministic occurrences, 
a global clock cannot by itself ensure proper coordination across sites, if 
all events hinge upon some initial nondeterministic occurrence as a trigger. 
And yet many coordination tasks depend upon external input, whether in 



19 



the form of timing or of an assignment to an unknown parameter, as a 
trigger. Such external input is, for all practical purposes, nondeterministic. 

We capture the essence of such coordination tasks in the following man- 
ner. We identify a particular spontaneous external input as a triggering 
event, denoted by Ct. A run in which the trigger Ct occurs is said to be 
triggered. An intended response to such a trigger is specified by a pair 
cth = {o-h^ih) with a/j being an action for process ih-^ A response takes 
place if process ifi performs the action a^. An instance of the Ordered Re- 
sponse problem is parametrized by a tuple (et, ai, . . . , ak), consisting of a 
trigger and a sequence of responses. Formally, we define the following class 
of problems. 

Definition 2 (Ordered Response) A protocol P solves the instance OR = 
(ct, ai, . . . , Ofc) of the Ordered Response problem if it guarantees that 

1. in a triggered run, every response ah, for h = l,...,k, will occur; 
moreover, if h < k then ah will happen before (i.e., no later than) 
a/i+i does. Finally, 

2. none of the responses ah occur in runs that are not triggered. 

Consider the following simplified scenario, where such a problem is im- 
plied. 

Example 1 Charlie's hank account is temporarily suspended due to credit 
problems. Should Charlie make a sufficient deposit at his local branch, 
Banker Boh at headquarters will re-activate the account. Alice holds a cheque 
from Charlie, hut trying to cash it before the account is re-activated will grant 
her a fine, rather than cash. 

Alice, Bob and Charlie can communicate over a communication network 
as depicted in Figure 3.1a, where the labels represent maximal transmission 
times. In particular, messages from Charlie to Boh and Alice take up to 10 
and 12 days to he delivered, respectively. This scenario can be viewed as an 
instance of OR in which a deposit by Charlie is the triggering event, and the 
responses are the account re-activation by Bob followed by Alice's cashing of 
the cheque. 

Intuitively, we expect Alice, Bob and Charlie to communicate in order to 

ensure a pro]:)or ordering of events. Indeed, as shown by Chandy and Misra, 

^For simplicity, wo assume that et happens at most once in any given run, as do each 
one of the actions performed in response to it. 
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(a) The network of Example 1 (b) Bob is notified by Alice 

Figure 2.1: Example 1 

if the network were asynchronous there would be no other way to ensure 
correct ordering except by a message chain linking Charlie to Boh, and then 
to Alice. The synchronous system of Example 1 offers more freedom. For 
example, Figure 2.1b shows Charlie sending word to Alice, who sends on 
the message to Boh and waits for 5 rounds to make sure of its arrival before 
safely cashing her check. We use a dashed arrow to denote that a message 
sent by Alice at t + 3 is sure to arrive at Bob's by t + 8. 

The Ordered Response problem captures a natural coordination scenario, 
and its precise specification provides us with a clearly defined scope within 
which synchronous causality can be investigated. Chapter 3 studies it fur- 
ther and establishes the exact scope of flexibility in communication that 
is allowed, for establishing ordering such synchronous systems. As we will 
see in the following sections of this chapter, knowledge plays a key role in 
providing such characterizations. 

We now turn to define another type of coordination problem. In syn- 
chronous systems it is often desirable to perform actions simultaneously at 
different sites, a classic example being the firing squad problem [10]. A 
natural variant of OR is the Simultaneous Response problem, defined as 
follows. 

Definition 3 (Simultaneous Response) Let e^ he an external input. Then 
SR = (et, , ctk) defines an instance of the Simultaneous Response proh- 

lem. A protocol solves the instance SR if it guarantees that if the triggering 
event Ct occurs, then at some later point all actions ai,. . . ,ak in the response 
set of SR will be performed simultaneously. 

A causal analysis of the Simultaneous Response problem will be con- 
ducted in Chapter 4. Note that the Simultaneous Response problem can 

be characterized by means of multiple ordering response problems. Let 
SR = {et,ai, . . . ,ak). For each pair of required simultaneous responses 



21 



CK, ^ G {ai, . . . ,ak}, we define two ordering response problems: OR"^'^ = 

(et,a,/3) and OR^" = (et,/3,a). A protocol that solves both OR"'^ and 
OR'^" will solve the simultaneous subproblcm SR"'^ = (et,a,/3). A protocol 
that solves both OR"^ and OR^^" for every a,f3& {ai, . . . , a^} will solve SR. 

Nevertheless, defining the simultaneous requirement as a separate prob- 
lem is worthwhile because solutions to such problems give rise to tighter 
epistemic characterizations, in the form of common knowledge among the 
set of responding processes. Chapters 5 and 6 will investigate aspects of 
generalizing the ordering requirements beyond ordered and simultaneous 
responses. 

2.2 Knowledge Gain in Synchronous Systems 

As noted in Chapter 1, Chandy and Misra used knowledge gain to refer to 
a scenario wherein process i "gains knowledge" , or "learns" , that some fact 
if pertaining to process j holds. In an asynchronous system, as it turns out, 
the only way for i to gain knowledge about j is by means of a message chain 
relating the two. Thus, in the asynchronous setting, knowledge gain reflects 
the way information flows in the system. 

Taken at face value, Chandy and Misra's notion no longer captures infor- 
mation flow when we move to synchronous systems, as process i may learn 
facts about process j by a mere glance at the clock (for example, by noting 
that my watch shows 4pi^ I "learn" that your watch shows 4pi^ too right 
now, something that I did not know before, while it was still 3:59). In order 
to maintain the desirable association with information flow we turn, as we 
did in the previous section, to nondeterministic occurrences. 

Definition 4 (Knowledge gain in synchronous systems) We will say 
that knowledge gain occurs in the interval \t..t'\ of run r whenever a non- 
deterministic event occurs at some process j no sooner than time t, and 
process i knows of this occurrence by time t' . 

At an intuitive level, we expect that knowledge of an ND (nondetermin- 
istic) event is dependent upon communication, and hence knowledge gain of 
such facts will reflect information flow. The following chapters will pursue 
the relations between knowledge gain and communication. Natural gener- 
alizations of the above notion of knowledge gain are nested knowledge gain 
and common knowledge gain. 
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Nested knowledge gain occurs within the interval [t..t'] of run r if an 
ND event e occurs at process j within the interval, and by the end of the 
interval Kij^Kii^ ■ ■ ■ Ki^occurred{e) holds for some sequence of processes 
(ii, ^2, .., ih)- On a similar vein, common knowledge gain occurs when, by the 
end of the interval of time, CG'OCCurred(e) holds for some group of processes 
G. 

The suggestive similarity of nested and common knowledge to the Or- 
dered and Simultaneous Response problems respectively, will be examined 
in Section 2.3. 

2.3 Relating Response Problems and Knowledge 
Gain 

This section charts out the formal relations between the two previously 
defined response problems, and knowledge gain. 

2.3.1 Response Problem to Knowledge Gain 

We start by looking at the Ordered Response problem. Intuitively, the 
coordination of responses so that they occur in a particular sequence suggests 
that knowledge gain is involved. In order to ensure that the events occur in 
sequence, each responder k must know that the previous k — 1 responses, as 
well as the trigger event, had already occurred (or are occurring right now). 

We now show that this is indeed the case. A caveat concerning knowl- 
edge is that we must assume of processes that they do not forget that they 
had already performed a response. Formally, this property is described as 
follows. 

Definition 5 (Response recall) Let OR be defined by (ct, ai, .., a^) and 

assume that TZ = TZ{P, 7) is a system of runs for a protocol P where all of 
the responses may occur, and 7 is any arbitrary context. Protocol P recalls 
responses for OR if for alll < h < k, r, r' in TZ and t' < t, if ah occurs at 
{ih,t') in r and {r,t) ~i {r' ,t), then ah also occurs at {ih,t') in r' . 

As we'll show in Section 3.6, the response recall assumption is not needed 
when Ordered Response is related directly to communication, rather than 
to knowledge. Note that the following theorem, and Theorem 2 as well, are 
not dependent on the particular context 7 being used. 
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Theorem 1 Let OR =(et, cci, . . . , ccfe) be an instance o/ OR , and assume 
that protocol P solves OR in 7 and that it recalls responses for it. Let r £TZ 
be a run in which e-t occurs, let 1 < h < k, and let th be the time at which 
ifi performs action in r. Then 

in, r, th) N Ki^Ki^_^ ■ ■ ■ isTji (occurred (et) A ND(et)). 

Proof We prove the theorem by induction on k. 

k = 1 : By definition of r, process i\ performs ai at ti. If {TZ,r,ti) \f 
i^ijoccurred(et) then by definition of 1= there exists a run r' G 7?. such 

that (r, ti) ~jj and where {Tl,r',ti) \f occurred(et). Yet as 

P solves OR , in r' action a\ is performed only if Ct has occurred, 
contradiction. Therefore it must be that {TZ,r,ti) t= A'j^occurred(et). 

By definition of OR, is an external input event and hence nondeter- 
ministic. This is universally true in the system, and hence {TZ,r,ti) t= 
KjjOccurred(et) implies {TZ,r,ti) t= (occurred (ct) AND(et)). 

k > 1 : Suppose that it is the case that 

(7^, r, tk) ^ Ki^Ki^_^ ■■■Ki^ (occurred(et) A ND(et)). 

Then by definition of ^ there exists a run r' G TZ such that (r, t^) 
(r',tfc) and where {n,r',tk) ^ Ki^_^Ki^_^ ■ • • i^^j (occurred(et)AND(et)). 
Since the protocol recalls responses, we now obtain that 

(7^,r^^fe_l) \f iCi,_,i^i,_2---^n(occurred(et) AND(et)). 

However, again as P solves OR , it also solves the sub-problem OR 
' defined by (ct, ai, . . . , ctfc-i). As is also performed by ij- in r', 
it must be that a\..ak-\ too get performed in r' . By the inductive 
hypothesis we get that for all 1 < /t < A; — 1 

(7^, r',th) ^ Ki^Ki^_^ ■ ■ ■ ii:ii(occurred(et) A ND(et)). 

In particular we get that 

(7^,r',^fe_l) N •••-fCii(occurred(et) AND(et)). 

This contradicts the previous result, and therefore it must be the case 
that 

{n,r,th) ^ Ki,Ki^_^ ■ • •i^j,(occurred(et) AND(et)) 
for all 1 < /i < fe, as required. 
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'Theorem 1 



We now turn to consider the Simultaneous Response problem. Here 
too there is an intuitive connection with knowledge gain. If all responses 
are always performed simultaneously, then every responding site must know 
that the other sites are responding too. Yet as the next theorem shows, 
the simultaneous response requirement implies an even stronger epistemic 
condition, in the form of common knowledge. As all responses must occur 
simultaneously, there is actually no need here to assume the response recall 
property from the arbitrary protocol. 

Theorem 2 Let SR = (ct, ai, . . . , a^), and assume that protocol P solves SR 
in 7. Moreover, let G = {ii, . . . , ik} be the set of processes appearing in the 

response set 0/ SR. Finally, let r ^ TZ be a run in which e-t occurs, and 
let t be the time at which the response actions are performed in r. Then 
{n,r,t) N CG(occurred(et) AND(et)). 

Proof Fix h,g & {l..k}. We first show that 

TZ N occurs(a/i) £'G(occurs(a/i)) A occurred(et)). 

Choose r',t' such that {Tl,r',t') N occurs(a/i). Note that since P solves 
SR and since response actions are performed only upon the occurrence 
of the trigger event et, we get {Tl,r',t') 1= occurs(a/i) O occurs(ag) and 
{TZ,r',t') \= occurs(a/j) — t- occurred(et). From the former equivalence and 
{7l,r',t') 1= occurs(a/i) we obtain that {7l,r',t') N occurs(ag). Since perform- 
ing a local action is written, at least for the current round, in the process's 
local state, we obtain that (TZ, r'.t') \= Ki^occurs{ag). Now using the for- 
mer equivalence again we get that (TZ,r',t') \= KigOCCurs{ah), and using the 
latter implication we get {TZ, r',t') 1= i('ig0ccurred(et). Putting these results 
together we conclude that {TZ,r',t') 1= i^j^ (occurs(a/i) A occurred(et)). Since 
g is arbitrarily chosen in G, we get {TZ,r',t') N £^(3(occurs(a/i) Aoccurred(et)), 
from which it follows that 

{TZ,r',t') N occurs(a/i) £^G(occurs(a/i) A occurred(et)) 

by our choice of r' ,t' . As false antecedents imply anything, we conclude that 
TZ 1= occurs(ah) £^G(occurs(a/i) A occurred(et)). 

Recall the Knowledge Induction Rule, that derives TZ\= tp ^ Cg4' from 
TZ \= if ^ Ecif A ip). Setting ip = occurs(a/j) and ^ = occurred(et) we 
apply the rule, and based on the above result obtain TZ N occurs(a/i) 



25 



CGOCCurred(et). We conclude by noting that {Tl,r,t) 1= occurs(a^) by as- 
sumption, and hence also {Tl,r,t) 1= CGOCcurrecl(et). 

By definition of SR, et is an external input event and hence nondeter- 
ministic. This is universally true in the system, and hence in particular 
{TZ,r,t) 1= CG!OCCurred(et) implies {Tl,r,t) 1= CG(occurred(et) A ND(et)). 

^Theorem 2 

Theorems 1 and 2 show that, in a precise sense, knowledge gain is a 
prerequisite for coordinated response. The next section will show that nested 
and common knowledge gain indeed characterize ordered and simultaneous 
responses, in the sense that they define the minimal epistemic prerequisites 
for such types of coordination. 

2.3.2 Knowledge Gain to Response Problem 

It is immediately apparent that no general law exists showing that knowledge 
gain implies a solution to a response problem, for processes are not, in 
general, required to act in any way upon the knowledge that they gain. 

In order to bridge the gap between knowledge and action, we would 
need to add requirements on the protocol being followed by the processes. 
Since, as stated in Section 1.2, this thesis is focussed on producing protocol- 
independent results, we do not delve deeply into such additions. ^ 

Nevertheless, as we will show, there exist protocols where indeed knowl- 
edge gain implies a solution to the response problem. Proving the existence 
of such a protocol comes to show that the relevant response problem (say 
ordered response) , is indeed characterized by the related type of knowledge 
gain (in this case, nested knowledge gain). The existence of such a protocol 
shows that, in general, no epistemic state stronger than nested knowledge 
can be gained as a result of solving the ordering response problem. ^ 

To exemplify the existence of protocols where nested knowledge gain 
implies a solution to OR, we introduce the following property for protocols. 

Definition 6 (Non-Hesitant Protocol) Protocol P is non-hesitant with 
respect to an Ordered Response problem OR = {ct, ai, ■ ■ ■ , at) if for each h < 
k, process ih performs ah as soon as Ki^Ki^_^ . . . Ki^ (occurred(et) A ND(et)) 
is established, but no sooner. 

^One could try, for example, to characterize those protocols where knowledge gain does 
imply a solution to the related response problem. 

similar argument is presented in Sections 3.9 and 4.5, in order to show that the yet 
to be defined centipede and ccntibroom communication patterns characterize nested and 
common knowledge gain, respectively. 
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As the following lemma shows, nested knowledge gain is sufficient for 
solving OR problems in protocols that are non- hesitant with respect to the 
problem. As for the existence of actual protocols that comply with the above 
definition, we can make things easy by assuming the context 7"^^** (as we do 
in the next two chapters) . Here it is easy to find protocols that satisfy non- 
hesitance, as well as consideracy (defined below) qualifications by insisting 
that processes timestamp their messages. 

Lemma 1 Let OR = (et,ai, . . . ,0;^) and let P be a non-hesitant protocol 
with respect to OR. If for every r G fi^nax _ 7^(p^^max) such that e^ occurs 
at {io,t) in r there exists time t' such that 

(7^"^"^r,^') N Ki,Ki^_^ . . . Ki,(occurred(e) A ND(e)), 

then P solves OR. 

Proof Assume r € 7^"*"^ such that e^ occurs at {io,t) in r. Prom 

(7^"*"^ r, t') N Ki^Ki^_^ ...Ki, (occurred(e) A ND(e)) 
we obtain the existence of tk < t' such that 

(a) (7^™'^^ r, th) N Ki^Ki^_^ . . . K,, (occurred(e) A ND(e)), and 

(b) 4 = or (7^'"«^r,^fe - 1) . . . (occurred (e) A ND(e)). 

By repeated applications of the Knowledge Axiom we extend this result 
into a series t <t\ < ■ ■ ■ <tk <t' such that for every h <k 

(a) {n"'^'=,r,th) N Ki^Ki^_^ . . . Ki,(occurred(e) A ND(e)), and 

(b) th = 001 (7^"»»^ r, th-l)!f^ Ki^Ki^_^ ...Ki, (occurred(e) A ND(e)). 

As P is non hesitant with respect to OR, we get that for every h < k 
process ih performs ah at th, and we are done. ■ 

Once again switching to the Simultaneous Response problem, we need a 
protocol where processes are more considerate, in order to ensure a solution 
to the problem. 

Definition 7 (Considerate protocol) Protocol P is considerate with re- 
spect to a Simultaneous Response problem SR = (ct, ai, . . . , ak) if for each 
h < k, process ih performs ah as soon as Cij^..._j^(occurred(et) A ND(et)) is 
established, but no sooner. 
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Lemma 2 Let SR = (et, cti, . . . , a^), let G = {ii, ■ ■ ■ ,ik) and let P be a 
considerate protocol with respect to SR. If for every r G Ji^nax _ T^^p 
such that e-t occurs at {iq, t) in r there exists time t' such that (JZ"^"'^, r,t') 1= 
CG(occurred(e) AND(e)), then P solves OR. 

The lemma's proof is immediate if we consider that Cg^P KgCcf for 
any g e G. 
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Chapter 3 

Gaining Nested Knowledge 



3.1 Introduction 

This chapter investigates the minimal communication needed in order to 
achieve nested knowledge gain in synchronous systems. As such, its re- 
sults provide an immediate generalization of previous findings pertaining to 
asynchronous ones. 

We have argued elsewhere (see Chapter 2) that knowledge gain can be 
seen as a close approximation of the spread of causal effect in distributed 
systems. Yet even though it is more rigorously defined than causality, knowl- 
edge is still a rather abstract notion. Thus, we motivate our investigation by 
studying the more concrete Ordered Response problem. We will show that 
in order for a protocol to solve the problem, a certain generalization of mes- 
sage chains must relate the trigger and responding sites in every triggered 
run. 

Sections 3.1 through 3.6 will introduce and discuss the new concepts in- 
volved in the analysis, and informally sketch out the results in terms of a 
necessity relation tying in communication to Ordered Response solutions. 
Sections 3.6 to 3.9 will then retrace our steps and provide the necessary for- 
mal underpinnings that uphold these results. The methodology, as discussed 
in Section 1.2, will be to prove that certain communication patterns are nec- 
essary in order for knowledge gain to arise, and then to use Theorem 1 to 
similarly relate these patterns to an Ordered Response. 

For the sake of clear presentation, we assume throughout this chapter 
and the next one that all examples and proofs take place over a synchronous 
system in which upper bounds are given for every existing communication 
channel. We have denoted contexts that generate such systems by 7'"^'' (see 
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(b) Cheque cleared at t + 10 



Figure 3.1 



Example 2 



Section 1.5 for more). We start by scrutinizing the frozen account example 
first shown in Section 2.1. 

Example 2 Charlie's bank account is temporarily suspended due to credit 
problems. Should Charlie make a sufficient deposit at his local branch, 
Banker Bob at headquarters will re-activate the account. Alice holds a cheque 
from Charlie, but trying to cash it before the account is re-activated will 
grant her a fine, rather than cash. Alice, Bob and Charlie can communicate 
over a communication network as depicted in Figure 3.1a. In particular, 
messages from Charlie to Bob and Alice take up to 10 and 12 days to be 
delivered, respectively. This scenario can be viewed as an instance of OR in 
which a deposit by Charlie is the event, and the responses are the account 
re-activation by Bob followed by Alice's cashing of the cheque. 

In a particular instance, depicted in Figure 3.1b, Charlie makes a deposit 
at time t, and immediately broadcasts a message stating this to both Alice 
and Bob. The message reaches Bob in 2 days and Alice in 4- Bob immedi- 
ately re-activates Charlie's account ^ at time t + 2. When can Alice deposit 
the cheque? The cheque would be cashed successfully at any time aftert-\-2. 
However, Alice only knows about Charlie's deposit att + A. But even at that 
point, she must keep waiting. In the absence of additional information indi- 
cating when Bob actually received Charlie's message, she is only guaranteed 
that this will happen by time t + 10. Knowing Bob's protocol, she can safely 
submit the cheque at or after time t + 10, but not sooner. □ 

In this example, Alice acts after Bob does. While in an asynchronous 
setting she would need to obtain explicit notification that Bob acted, in the 
synchronous sotting considered in Example 2 she can base her action on the 

^For case of exposition, we assume throughout the thesis that actions are performed 
instantaneously; alternative Eissumptions would not significantly affect the analysis. 
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(a) The network of Examples 3 and 4 (b) Susan speeds up Alice's response 



Figure 3.2: Examples 3 and 4 

information that Charlie sent Bob the message at time t, combined with 
the bound determining when this message will arrive, and her knowledge of 
Bob's protocol, which ensures that Bob will act immediately upon receiving 
Charlie's message. Her action, which clearly depends on Bob's action having 
taken place, can be performed without an explicit message chain from Bob. 
Example 3 illustrates a variation on Example 2, in which Alice can do slightly 
better. 

Example 3 In a setting similar to Example 2, Susan is Bob's supervisor 
at the bank. The network is now as depicted in Figure 3.2a. Suppose that 
Charlie broadcasts his deposit to all three, and that communication is deliv- 
ered as in Figure 3.2b. In this case Alice can, as before, submit her cheque 
at t + 10. But she can do even better. Since she receives a message from 
Susan at t + 8 that was sent at t + 3, the bound on the Susan-Bob channel 
ensures her that Bob is to be notified of Charlie's deposit by time i + 7. Thus 
Charlie 's account will also be solvent as of time t + 7, and Alice can safely 
cash her cheque in this case upon receiving Susan's message. □ 

In both examples, the timing of Alice's action depends on the time 
bounds, but Example 3 shows a more complex interaction between mes- 
sage arrivals and time bounds. In Example 2 Alice combines information 
gained by means of a message chain from Charlie to her with the known time 
bound on the Charlie-Bob channel. Charlie's message to her serves both to 
notify her about the occurrence of a deposit event, and as a temporal anchor 
for a timing argument that allows her to properly coordinate her response 
with Bob's action. 

Example 3 starts out the same for Alice. She is still notified of the 
deposit event by a message from Charlie at t + 4. As before, this message 
can also be used to coordinate her response to follow Bob's by clearing the 
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cheque at t + 10. But then the message from Susan arrives, and a second 
message chain between Charlie and Ahce is completed. Alice already knows 
about Charlie's deposit based on the earlier message. The new message 
serves her to coordinate a temporally "tighter" response to Bob's action at 
t + 8 rather than t + 10. For the purpose of coordination, Susan's message 
plays a similar role in Example 3 to that played by Charlie's in Example 2. 

Had the triggering event et been unconditionally guaranteed to take 
place at some time to, then the protocol could directly specify the times 
tk > tk-i > ■ ■ ■ > h > to at which the response actions could be per- 
formed with proper coordination. However, since Ct is a spontaneous event, 
knowledge about its occurrence must "flow" from io to the responding sites. 
With respect to coordination, however, the above examples demonstrate 
that coordination between the sites does not necessary require explicit com- 
munication between successive responses. A site ih+i may be able to know 
that afi has taken place by combining a priori knowledge regarding tim- 
ing guarantees, information it has regarding other processes' protocols, and 
information it obtains via explicit communication. 

Examples 2 and 3 illustrate how a process can come to coordinate its 
response with another process despite the lack of explicit communication 
between the sites, based on knowledge of existing upper bounds on com- 
munication. But bounds can be used in an additional fashion. Namely, if 
by time t + maxij process j receives no message sent by i at time t, then j 
can discover that no such message was sent [27]. Depending on i's protocol, 
this can provide j with information about i's state at time t. Consider the 
following refinement of Example 3. 

Example 4 In the network of Example 3 depicted in Figure 3.2a, suppose 
that Susan sends Alice a message in every round as long as Susan has not 
heard from Charlie about an appropriate deposit. In this particular instance, 
Susan receives a message from Charlie at time t+2, at which point she stops 
sending her update messages. At time t + 7 Alice will be able to "time-out" 
on Susan's time t+2 message. She then knows that Susan heard from Charlie 
at t + 2. Moreover, knowing that Susan relays information to Bob as before, 
Alice knows that Bob heard about the deposit no later than time t+6. Hence, 
Alice can safely cash her cheque at time t + 7 rather than t + 10. □ 

In Example 4 Alice learns of Charlie's deposit without receiving any 
message whatsoever. She clearly receives no message chain originating from 
Charlie. Nevertheless, it seems instructive to think of Susan as sending 
Alice a "NULL message" in the sense of [27] at time t + 2, carrying relevant 
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information, by not sending an actual message. Under this interpretation, 
Example 4 contains a message chain from Charlie to Alice that consists of 
Charlie's concrete message to Susan, followed by Susan's Null message to 
Alice. 

Lamport utilized Null messages in [27] for his algorithms implementing 
the state machine model in the synchronous setting. In the current paper 
however. Null messages will be used to define a relation called syncausality, 
extending happened before. A syncausal chain will then be a chain consisting 
of a sequence of concrete and Null messages. Syncausal chains are required 
for information flow regarding nondeterministic events such as spontaneous 
external inputs. 

3.2 Bound Guarantees 

In the synchronous model we consider, every directed communication link 
between adjacent processes i and j provides a bound maxij on the maximal 
transmission time of messages. These local bounds naturally induce more 
general bounds, or guarantees as we call them, for any pair of (not necessarily 
directly) connected sites. 



Consider Figure 3.3a, showing a variant of the network graph shown in 
Figure 3.2a. Assuming that all of the sites involved are fully cooperative 
in relaying messages, a message sent by Charlie with destination Bob can 
be guaranteed to arrive after 9 rounds, if it travels from Charlie to Susan 
and then to Bob, rather than from Charlie to Bob directly. Similarly, a 
message from Susan to Alice can be guaranteed to arrive after no more than 
5 rounds, if it makes the roundabout trip through Bob. Charlie and Alice 
may also communicate, with a bound of 11 rounds, if they use Susan and 
Bob as a relays. 




(a) A variant network graph 



(b) Transmission distances 



Figure 3.3: Timing guarantees 
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Thus, the naturally induced transmission distance between processes h 
and k, which we denote by D(/i, k), is the minimal distance between h and k 
in the weighted directed network graph, in which the bounds maxy on trans- 
mission times are the edge weights. In particular, D(i,i) = for all i G P. 
Figure 3.3b shows the induced transmission distances for the network of 
Figure 3.3a. Values that differ from the ones in (a) are shown in a gray box. 

We find it convenient to represent that time instant t on process f's 
timeline by the pair (i, t), called a process-time node, or node for short. Based 
on the transmission time bounds for the channels, we define the following 
hound guarantee relation among process-time nodes: 

Definition 8 (Bound Guarantee) We say that {i,t) and {j,t') are re- 
lated by a bound guarantee, and write {i,t) — ■> {j,t'), iff t + D{i,j) < t' . 

Observe that bound guarantees are independent of the speed at which 
messages actually arrive at a particular run; they depend only on the weighted 
network topology. If (i, t) — ■> {j, t') then it is possible to guarantee that a 
message sent by i at time t will arrive at j by time t', assuming that relay is 
instantaneous. Since the bound-weighted network is assumed to be known 
to the processes, the passage of time can allow a process to obtain knowledge 
about remote events that would not be available, say, in an asynchronous 
setting. 

The next sections will explore the ways by which this knowledge can be 
exploited. 

3.3 Syncausality 

In Example 4 Alice learns of Charlie's deposit without a message chain from 
Charlie reaching her. A message chain of a slightly more general type does 
exist there, however, in which Susan's not sending a message to Alice at 
time i -|- 3 is a Null message. More formally, consider a network in which 
i and j are directly connected by a communication link with bound maXy. 
Then i can be thought of as "sending" a Null message over this channel 
at {i,t) if it sends no physical message over the channel at time t. This 
message is considered as being "delivered" to j at {j,t + maXy) (see [27]). 
In the presence of clocks and bound guarantees. Null messages can serve to 
transfer information between processes. By identifying that no message was 
sent at (i, t), process j may be able to draw nontrivial conclusions about i's 
state and i's knowledge there. 
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We now formally define Syncausality, a generalization of Lamport's 
happened-before relation that accounts for Null messages and is thus based 
on "generalized" message chains that can contain Null messages as well 
as normal messages. The relation is defined over process-time nodes rather 
than events, since defining not sending and not receiving messages as explicit 
events would be cumbersome. 

Definition 9 (Syncausality) Fix a run r. The syncausality relation 
over nodes of r is the smallest relation satisfying the following four condi- 
tions: 

1. If t< t', then {i,t) ii,t'); 

2. If some message is sent at {i,t) and received at {j,t') then {i,t) 
{j,t'); 

3. If no message is sent at {i,t) to i's neighbor j then {i,t) {j,t + 
maxy); and 

4. If {i,t) {h,t) and {h,i) {j,t'), then {i,t) {j,t'). 

Clauses (1), (2) and (4) correspond to the local precedence, message 
precedence and transitivity clauses that define the happened-before relation. 
Syncausality thus refines (and hence directly generalizes) happened-before. 
The third clause corresponds to timeout precedence, capturing the case of a 
Null message being sent by (i, t) and eventually received at {j, t + maxij). 
We can thus view syncausality as being based on syncausal chains, consisting 
of a chain of actual and Null messages. 

Syncausality is also a generalization of bound guarantees. Note that 
nodes {i,t) and {j,t + maxij) will always be syncausally related: cither {i,t) 
does send j a message, in which case the message is received by {j,t + maxy) 
and {i, t) (j, t + maxy) will hold by message and local precedence, or no 
such message is sent, which case {i,t) and {i,t) ^ {j,t + maxij) will hold 
by timeout precedence. Indeed, a straightforward induction on the number 
of edges in the shortest path of length D{i,j) between nodes i and j in the 
network immediately yields: ^ 

Lemma 3 // {i.f) --->■ ij.t') then {i,t) ij,t') in every run r. 

^In fact, given the natural definition of -» over process-time nodes, ^ is the coarsest 
common refinement of -» and --• >. 
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Notice that the bound guarantee relation depends only on the network 
and the transmission bounds maxij. We view it as being given a priori as 
part of the context. By contrast, clause (2) of the syncausality relation, 
capturing message precedence, depends on the actually realized message 
transmission times in a given run. Therefore, syncausality is run-dependent. 

We will show in Section 3.6 that knowledge about the occurrence of 
nondctcrministic events can be obtained only by way of syncausal chains. 
Consequently, for the Ordered Response problem we can show the following: 

Theorem 3 Let P be a protocol solving OR = (ct, ai, . . . , a^) . If Ct occurs 
at {io,t) in r E TZ"^"'^ , and ah does at {ih,th), then {iQ,t) {ih,th) in r. 

The formal proof of Theorem 3 can be found in Section 3.6. It is obtained 
by first showing that indeed i^ needs to know that Ct occurred at {ih^th)^ 
and then showing that without the syncausal connection process ih can not 
know this. 



3.4 Double Response 

The examples in the introduction all involve a simple problem of the form 
OR = (ct, ai, 02). We call this form a double response. Wc can view a double 
response as incorporating two single responses to the triggering event, with 
an added coordination requirement to ensure that that a2 does not occur 
before ai. 

If the first response is performed by i\ at time ti, and the second by 
i2 at then Theorem 3 implies that in a triggered run of any protocol 
solving OR, necessarily (io,t) and {iQ,t) (^2,^2) must hold. A 

number of ways by which the required coordination between ii and ^2 may 
be achieved have already been informally considered in Sections 3.1 and 2.1. 

First, if (io,0 (^ij^i) ^ {i2,t'), as seen 
in Figure 3.4, then it is easy to see that ti <t' . 
Wavy arrows have replaced the straight ones used ^ 
in Section 3.1, to denote the possibility of non- 
trivial syncausal chains connecting the nodes. 

This case echoes the chain structures prevalent in t^- o ^ 

11. Figure 3.4 

the asynchronous model, with syncausal chains 

replacing the pure message chains. Yet as our previous examples have shown, 
there are other possible means for coordination. 
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In Example 2 we had 12 (played out by Al- --^jj^j^{i'2,t') 

ice) waiting until a time t' = t + 10 such that 

' («o, *) 




(io,t) — {ii,t')- So by time t' when 02 is per- """ ^ t(n,iO 

formed, ii must surely have gotten the message 

about the occurrence of Ct and have performed Figure 3.5 

ai. This possibility is schematically shown in 

Figure 3.5. The figure shows the syncausal relations that are necessary 
in order for 12 to coordinate its action with ii. As 12 need not be aware 
of the realized connection between io and ii and the time ti when ai was 
actually performed, this connection is not shown. 

In Examples 3 and 4 there exists some node 
(is, ^3), representing Susan at times i + 3 and t + 
2 respectively, that is tied into the coordination i^-^ 
process as depicted in Figure 3.6. 

In this case 02 is set by 12 to a time t' such 
that (^3,^3) {k,t'). Observe that the shad- Figure 3.6 

owed lines in Figure 3.2b outline an underlying 
formation identical to the one shown in Figure 3.6. 

Another formation that can be used to ensure coordination had been 
brought up in Example 1. Consider Figure 3.7. Here 12 serves as a relay for 
ii, and performs 02 at a time t' such that (i2,i2) {h,t'). In terms of 
the cheque clearance scenario from Section 3.1, this would be equivalent to 
Alice getting a message from Charlie at time t2 and then forwarding it on 
to Bob. 

Knowing that her message will not take 
longer than 5 days to arrive, she waits until day (j^ f) 

t2 + 5 and then clears her cheque. 

What about other formations? Consider 

the scenario depicted in Figure 3.8. It can 

be associated with the situation in Exam- „ _ 

1 rv ■ / ^ A T • ligure 3.7 

pie 2, at time t = t + b. At this stage, 

both Alice and Bob have received a message from Charlie, but Al- 
ice cannot be sure that that the message to Bob has indeed ar- 
rived, because the transmission distance between Charlie and Bob is 10. 

Returning to the figure, at t' process ^2 cannot 

perform 0:2 without risking the possibility that :^7^(»2,0 



V''2, 1-2 




ai has not yet been performed. A protocol in 

which i2 does do 02 at t' is one that does not -^^^^^^^^^^^^^ 

solve the instance of the OR problem in cases 

where the message to ii takes longer than 6 time Figure 3.8 
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steps to arrive. As we are assuming a protocol 

that solves the ordering problem in all runs, such a scenario is impossible. 

We can show that the formations in Figures 3.4 to 3.7 exhaust the pos- 
sible causal formations in a protocol that solves the double response OR 
problem. All of the possible configurations considered above are described 
by the generalized Figure 3.9, where ^ is a parameterized node on a syn- 
causal path between {io,t) and (^,^2)- If we set 9 = (ii, ti) wc get the setting 
shown in Figure 3.4. Similarly, setting 6 to (io,^)' (^3)^3) (^2,^2) gives 
us Figures 3.5, 3.6 and 3.7 respectively.^ 



3.5 Centipedes 

The double response problem incorporates two aspects required for ordering 
responses: notification of the responding sites regarding the occurrence of 
the trigger event, and coordination of the responses between these sites. 
This section tackles the OR problem in its most general form, where any 
number of responses may be required to the trigger's occurrence. 

Comparing the individual instances in Figures 3.4 to 3.7 to their gener- 
alization in Figure 3.9, we see a pattern emerging, where the actual node 
standing in for the parameterized 6 gets informed of the occurrence of the 
trigger event, and serves to split the path and route the new information 
to both {ii,t') and {i2,t'). The condition on this splitting node is that it 
must be able to guarantee the arrival of information at ti by time t'. This 
promise can then be used by 12 to coordinate its own action with iis. 

Intuitively, we would expect then to sec similar split-and-promise mech- 
anisms crop up further down the line when multiple responses are required. 
This idea gives rise to the centipede structure, defined below. 

^Recall that both and — + are reflexive relations, so for example setting = {io,t) 





(^2,^2) 



Figure 3.9: Coordination in Double Response 
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{i2,tk) 



Figure 3.10: A centipede 



Definition 10 (Centipede) Let r G 7^™<^^, let ih e F for < h < k and 
let t < t'. A centipede for {io, . . . ,ik) in the interval (r, t..t') is a sequence 
of nodes 9q 9i ■ ■ ■ 9k such that 9q = {iQ,t), 9k = {ik,t'), and 
&h — {ih,t') holds for h = 1, . . . , k — 1. 

A centipede for {io,...,ik) in the interval {r,t..t') is depicted in Fig- 
ure 3.10. Extending Figure 3.9, Figure 3.10 shows a syncausal chain ex- 
tending between (io, t) and {ik, t'), and along this chain a sequence of "route 
splitting" nodes ^1,^2, etc. such that each 6^ can guarantee the arrival of 
a message to ih by time t'. Such a message can serve to inform i^ of the 
occurrence of the trigger event, and as the set of previously made guarantees 
gets shuffled on to the next splitting node, each responding site ih can be 
confident that all previous sites (ii, . . . , ih-i) had already responded. 

We remark that, since both and — -> are reflexive, it is possible for 
adjacent 9j's to coincide. Moreover, it is possible (in fact, probably quite 
common) that 6fi = {ih,th) for some t^ < t'. Observe that every simple 
(Lamport-style) message chain gives rise to a centipede of a simple form 
in which all body nodes 9h are co-located in this sense with their respec- 
tive leg nodes. It follows that a centipede is a natural, albeit nontrivial, 
generalization of a Lamport-causal chain. 

While the centipede structure may seem rather intuitive, it is not at all 
clear that such a structure should exist whenever the OR problem is solved. 
Nevertheless, Theorem 4 below shows that this is the case, thus providing a 
concise statement of the communication structures that are required for the 
ordering of events in every synchronous system. 

Theorem 4 (Centipede Theorem) Let P be a protocol solving the OR = 
OR(et, ai, . . . , afc) in ^"^^^^ and assume that Ct occurs at {io, t) inr e 7^"*"^. 

is not at odds with 9 {io,t)- 
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Figure 3.11: A centipede in Example 3 



If ik performs at time t' in r then there is a centipede for {io, ■ ■ ■ ,ik) 
in (r, t..t'). 

Note however the subtle relation between the Ordered Response require- 
ment and the structures that it necessitates. Suppose that Ct occurs at 
time i in a run r G 7^(-P, 7"^^'*) where P solves OR = (ct, ai, . . . , a/j). Then 
there exist times ti < t2 < ■ ■ ■ < tk where the actions ai,a2, --(Xk are per- 
formed respectively. The theorem implies that for each h < k there exists 
a centipede for {zq, . . . ,ih) in {r,t..th). Such a centipede could be used to 
inform all (zq, . . . , ih) processes of the occurrence of Ct, and would also serve 
to coordinate response ah so as not to occur before any of the responses 
a\...ah-i- However, a centipede for {io,...,ih) in {r,t..th) need not be a 
sub-structure in a centipede for (ip, • • • ,ih+i) that occurs in (r,t..th+i)- 

Recall Example 3 in the introduction, here redrawn in Figure 3.11. The 
centipede for {Charlie, Bob) is the sequence {{Charlie, t), {Bob, t + 2)). The 
centipede for {Charlie, Bob, Alice) that is used by Alice to coordinate her 
actions with those of Bob is the sequence 
{{Charlie, t), {Susan, t + 3), {Alice, t + 7)). 

3.6 Knowledge Requires Syncausality 

This section begins the second part of the chapter, wherein we review the 
findings described thus far in light of the knowledge-based analysis paradigm. 
This will provide us with formal proofs to all quoted theorems, as well as a 
deeper understanding of the forces at play. 

A word on terminology. Despite not having proved Theorems 3 and 4 
just yet, we will take to referring to syncausality and the centipede as causal 
structures, anticipating the results of the coming sections. 

We start by considering the basic syncausality relation ~->. In Section 3.3 
we stated that every response event must be syncausally related to the 
trigger et- In this section we prove this claim formally. 
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A simple and useful property of syncausality is a slight extension of the 
idea that if one node syncausally affects another, then the former must have 
happened before the latter one: 



Lemma 4 // {i, t) {j, t') then t < t' , with t = t' holding only ifi = j. 

Proof All {i, t) {j, t') causality instances generated by clause 1 have the 
property that i = j and t < t' . By definition of the synchronous context 
^max^ messages take at least one time step to be delivered. Moreover, the 
upper bounds on message transmission times are assumed to satisfy maXy > 
1. Therefore, all {i,t) {j,t') causality instances generated by clauses 2 
and 3 have the property that i ^ j and t < t' . As a result, a straightforward 
induction on the number of times the transitivity clause 4 is applied in a 
derivation of {i,t) {j, t') yields the desired claim. ■ 

Lamport relates the happened-before relation to light cones in Minkowski 
space-time [26]. In the same vein, it is natural to consider past and future 
causal "cones" induced by syncausality. 

Definition 11 (past and fut cones) We define the future causal cone of 
a node a (in run r) to he 



We will often treat the sets past and fut more simply as sets of nodes, 
rather than sets of pairs, when the related ND events are immaterial in the 
context. 

Observe that the cones induced by syncausality in synchronous sys- 
tems are significantly larger than the ones that follow just from Lamport's 
happened-before relation. Moreover, just as the future and past cones meet 
at the current point in space-time for light cones, we can use Lemma 4 to 
show: 

Lemma 5 For all runs r G TZ^"-^ and nodes a and j3: 
1. fut(r, a) n past(r, a) = {a}, and 




a ^ in r and NDg 

is the set of ND events and initial states in 6 in r 



Similarly, the past causal cone of a is 




^ a in r and NDq 

is the set of ND events and initial states in 9 in r 



2. 13 iff fut(a) n past(/5) / 0. 
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Proof Let a = {i,t) be a process-time node. For part (1), observe that 

a ^ a since is reflexive by clause (1) of Definition 9. Hence a G 

fut(r, a) n past(r, a). By Lemma 4, if a / /3 = {j,t') G past(a), then 
t' < t. Similarly, if a 9 = (h, t") G fut(a) then t < t". It follows that 
fut(r, a) n past(r, a) = a. 

For part (2), assume that a ^ /3. Then a G past(/3) by definition of past. 
Since a G fut(Q) by part (1), it follows that a G fut(a) n past(/3) ^ 0. For 
the other direction, suppose that 9 G fut(a) n past(/3). Then by definition 
we have that a 9 and 9-^/3. By transitivity of (clause 4) we have 
that a /3, and we are done. ■ 

The next step in relating knowledge to syncausality in synchronous sys- 
tems comes from the observation that the events that occur in the past 
(syncausal) cone of a node completely determine the local state at the node. 
A proof by induction on all nodes {j, t') with < t' <t shows: 

Lemma 6 Let r,r' G TZ'^"'^ . 

If past(r, (i, t)) = past(r', (i, t)) then riit) = r[{t). 

Proof A straightforward proof by induction on t' in the range Q < t' <t 
shows that, for all j G P, if {j,t') G past(i,t) then rj{t') = r'j{t'). By 
assumption, r and r' agree on initial states in (j, 0) . The induction step 
is proved based on the fact that each local state which is not initial is 
determined by the previous local state of the same process and by ND events 
in that process in the last round. Thus, rj{0) = rj{0) for all j G P. The 
claim follows from the fact that {i,t) G past(r, {i,t)). ■ 

Since the knowledge of a process in 7^™"^ is determined by its local state. 
Lemma 6 implies that this knowledge is determined by the past causal cone. 
We can now state and prove, using Lemma 6, the following knowledge gain 
theorem for two processes: 

Theorem 5 (Basic Knovi^ledge Gain) Assume thate takes place at {io,t) 
in re 7^"*"^ = 7^(P,7'^^'<). // (7^"'<^^, r, t') N isTji (occurred (e) A ND(e)) then 
{io,t) (^l,^')■ 

Proof Let e be an ND event occuring at (ig , t) in the run r G "RJ^"'^ . 
We shall prove the contrapositive: If {io,t) (^i,*') then [TZ"^"-^ ,r,t') \f 
ifjj (occurred (e) A ND(e)). By assumption, 7^™■«=^ = Tl{P, j"'^'') for some 
protocol P. Let r' G TZ^"-^ be a run identical to r until (but not including) 
time t, in which 
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(i) the environment's actions at all nodes in past{r,ii,t') are identical to 
those in r; and 

(ii) the environment's actions at io in the interval [t, t'] are identical to 
those in r with the exception that e does not occur in any of the related 
nodes. Thus if t = then i^'s initial state in r' differs from that in r 
by not including e, and similarly for other local states. Finally, 

(iii) all messages delivered to nodes not in past(r, ii,t') are delivered at the 
maximal possible transmission time according to the bounds maxy. 

To see that such a run r' indeed exists in TZ™"""^, we note that clauses (ii) 
and (iii) relate to nodes outside past(r, {ii,t')) and thus by assumption do 
not contradict clause (i), and that by definition an external input event is 
entirely independent of the run's past history, so its non occurrence in an 
interval of time is possible. Since 7^"*"^ contains all runs of P in j"^^^^ it 
must include r' . 

Notice that (7^™"^, r', i') 1^^ (occurred(e) AND(e)): first, since events are 
distinct in a run and since e occurs at time t in r, it does not occur in 
r at any time previous to t. Since r' is identical to r until time t the 
same applies for r'. Next, if e is an external input, then the possibility 
for e to occur at some t G [t,t'] is foiled by (ii), while if it is a message 
receive, then the following argument applies. Since {io, t) -/^ {ii,t') then also 
(io, t) -/^ (ii, t') for all t > t, as {io,t) ^ (ig, t) is true of all t >t. It follows 
that {io,t) ^ past(r, (ii, t')) for all t >t. So cither the message is postponed 
beyond time t' , or else it cannot be thus postponed, in which case e is not 
an ND event (an early receive) when it occurs in r'. For all cases we get by 
definition of N that (7^'""^,r',^') ^ (occurred (e) A ND(e)). By definition, r' 
agrees with r on initial states, external inputs, and delivery times on nodes 
of past(r, (ii, i')). Thus by Lemma 6 we have that {r,t') {r',t'), and 
therefore (7?."^"^, r, t') I7' i^jj (occurred (e) A ND(e)), as desired. ^Theorem 5 

The proof of Theorem 5 is obtained by constructing a run r' indistin- 
guishable to ii at t' from r in which no ND events occur outside 
past(r', (ii, t')) = past(r, (ii, t')). Theorem 5 captures a natural sense in 
which syncausality is a notion of potential causality for the synchronous 
model. 

The proof of Theorem 3 can now be derived. For protocols that recall 
responses, the theorem follows immediately from Theorems 5 and 1. We 

prove the theorem for arbitrary protocols by relating the general case to 
that of protocols that recall responses (Definition 5). 
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Proof of Theorem 3 Let P be a protocol solving OR = OR(et,Q;i, ..,Q!jk). 
Define P' to be a protocol that differs from P only in that every process i 
maintains a list called Responses, to which it adds an item {ah,th) when- 
ever i performs a response a/j. (We assume w.l.o.g. that no list with this 
name is used by P.) Notice that this list is an auxiliary variable that does 
not affect the behavior of the protocol. Indeed, there is an isomorphism 
between the runs of P and those of P' , where the same nondeterministic 
events and the same actions take place at all nodes of corresponding runs. 
In particular, since P solves OR, then so does P'. By construction, P' recalls 
responses. Denote 7^"^"^' = TZ{P', 7"^^'^). Assume that Ct occurs in the run r 
of P and let r' be the corresponding run of P' . 

Let 1 < h < k. Since P' recalls responses, we have by Theorem 1 that 
{TZ"^^^' ,r' ,t') 1= i^j^occurred(et). As every external input is, in particular, 
an ND event, this gives us (7^'"''^^ r', t') 1= Kj^(occurred(et) A ND(et)). We 
now use Theorem 5 and the fact that e-t occurs at {io,t) to conclude that 
{io,t) -w {ih,t') in r'. 

Since all actions and communication events in r and in r' are the same, 
it follows that {io,t) {ih^t') in r too, as required. ■ 

3.7 Nested Knowledge Requires Centipedes 

When we move beyond single response problems into the double and k re- 
sponse variants. Theorem 1 provides us with nested knowledge conditions. 
Showing that nested knowledge implies the existence of a centipede requires 
a substantial formal theory. This section develops the required theory. 
The first relevant notion is captured by the following definition: 

Definition 12 (Bridge nodes) Fix r and let a ^ a'. We say that (5 
bridges a and a' if 

1. a j3 — ■> a! and 

2. a — ■> /3 implies /3' = P, for all nodes /?'. 

Intuitively, a bridge is an earliest node that is syncausally affected by a 
and precedes a' by way of a timing guarantee. Interestingly, bridges are 
guaranteed to exist: 

Lemma 7 If a a' then there is a node /3 bridging a and a' . 
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Proof Let a a', where a = {i,t) and a' = {j,t'). By Lemma 4 we 
have that t < t' . We prove the claim by induction on d = t' — t. The base 
case is d = 0, in which case t = t' and by Lemma 4 we have that a = a' . 
Since a a a holds, and a P' a holds only for 13' = a, it 
follows that /5 = a is a bridge as required. For the inductive step, let d > 
and assume that the claim holds for all pairs of causally related nodes with 
time differences strictly smaller than d. Since a' a' by definition, the 
assumption that a a' clearly implies that a a' a'. We consider 
two cases. If there is no node 7^ a' such that a ^ /?' — -> a' then /3 = a' 
is the desired bridge node. Otherwise, such a (3' = exists. As before, 

we obtain by Lemma 4 that ti < t' . In particular, di = ti — t < t' — t = d. 
Thus, since a P' we have by the inductive assumption for di that there 
is a node P bridging a and /3'. It follows that a P — ■>/?' — and P 
satisfies the minimality clause (2) of Definition 12 with respect to a. As ---> 
is a transitive relation, we obtain that a P — ■> a' and the claim follows. 

■ 

Bridges are closely related to early message receives: 

Lemma 8 If a ^ P and P bridges a and a' , then there exists some P' such 
that a P' P and the syncausal chain P' P consists of a single early 
receive. 

Proof Denote a = and P = {12, t2). If P bridges a and a' then, 

in particular, a p. If, in addition, a ^ P then t\ < t2 by Lemma 4. It 
follows that a P' P where P' P \s derivable by clause (1), (2), or (3), 
and P' = {i',t') for some t' < t2- li P' P is derivable by (1) or (3), then 
P' a' and /? docs not bridge a and a' , contradicting the assumption. 
The alternative is that P' ^ P is derivable from (2) but not from (3), and 
hence P' P must be an early receive, as claimed. ■ 

The existence of bridge nodes as a special kind of node motivates an 
alternative approach to defining centipedes, based on bridge nodes. We 
start with the notion of centinodes: 

Definition 13 (Centinode) We inductively define node 9 to be a 
{ig, . . . , ik) centinode in (r, t..t') as follows. 

k = 0: 9 is a (io) centinode iff 9 = {io,t); while 

k > 0: 9 is a {io, . . . ,ik) centinode iff there exists a {io, . . . , ik-i) centinode 
9' in {r,t..t'), such that 9 bridges between 9' and {ik,t') in r. 
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As a straightforward conclusion of Lemma 7 we can show: 
Lemma 9 The following three are equivalent: 

1. A centipede for {iQ, ... ,ik) in {r,t..t') exists; 

2. A {io, . . . ,ik) centinode in {r,t..t') exists; and 

3. A centipede {Oq, . . . , 6k) for {io, . . . , it) in (r, t..t') exists, in which every 
node 6j is a {io, . . . , ij) centinode in (r, t..t'), for j = 0, . . . , k. 

Proof The truth of 3 =^ 2 is immediate. A straightforward induction on k 
shows that 2 =^ 1, as the current centinode 6^ is syncausally related to O^-i 
and timing guarantee related to {ik, t'). We now prove that 1^3. Assume 
that C = {9o, ■■,0k) is a centipede for {io, ■ ■ ■ ,ik) in {r,t^.t'). Wc define by 
induction onh < k centipedes Ch = {O'q, ■■, O'f^, dh+i, ■■, G^) in (r, t..t'), in which 
the nodes 9'q to 9'^ are centinodes. The final centipede in the construction 
satisfies the conditions of 3. 

h = Q : By definition, the initial node 9o in C is a (io) centinode in {r,t..t'). 
Defining 9q = 9o we have Co =C. 

h> : Assume that a centipede Ch-i = {O'q, ..,9'^_^,9h, ■■,9k) as described 
above has been constructed. By Lemma 7 there exists a node 9'^ 
bridging 0'f^_-^ and 9}i. Since Oh — ■> {ih, t') we get that 0^ is a centinode 
for (io,...,ife) in {r,t..t'). BeRne Ch = {9q, ■■,9'f^,9h+i, ■■,9k) ■ U h = k 
then we are done. Otherwise, since 9h' Oh Oh+i and Ch-i = 
{O'q, .., 0'i^_^, Oh, ■■■,9k) is a centipede for {io, . . . , i^) in (r, we obtain 
that Ch is also such a centipede, as required. 



Lemma 9 allows using centinodes and centipedes interchangeably. In- 
deed, clause 3 suggests that we can without loss of generality think of cen- 
tipedes as consisting of a sequence of centinodes. We are now ready to prove 
our main theorem, stating that the existence of a centipede is a necessary 
condition for attaining nested knowledge of an ND event: 

Theorem 6 (Knowledge Gain) Let P he a deterministic protocol, and 
let r e 7^"*"^ = TZ{P,j"'^''). Assume that e is an ND event at {io,t) in r. If 
(j^max^ r, t') 1= Ki^,Ki^_^ ■ ■ ■ i^^j^(occurred(e)AND(e)), then there is a centipede 
for {io, ■■■,ik) in {r,t-t')^ 
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Proof We shall prove the contrapositive form: if no centinode {io, ■ ■ ■ , ik-i,ik) 
exists in (r, t..t'), then (7^"*«^, r, t') ^ Ki^Ki^_^ ■ ■ ■ Kj^(occurred(e) A ND(e)). 
We reason by induction on > 1: 

A; = 1 This case is a rephrasing of Theorem 5: By assumption, there is no 
(io, h) centinode in (r, t..t'). In other words, there is no node 9 bridging 
{io,t) and {ii,t'). By Lemma 7 it follows that (io,^) ihjt'). Thus, 
by Theorem 5 we have that (7^'"«^, r, t') \f (occurred (e) A ND(e)), 
as claimed. 

A; > 2 Assume inductively that the claim holds for k—\. Moreover, assume 
that no (io, • • • , ifc) centinode exists in (r, t..t''). For every r' G 7^"""' let 
& = {^1, . . . , Q'^} be the set of (io, . . . ,ifc-i) centinodes in {r',t..t'). 
Observe that 6' (^i/^^ t') in r for all 6' £ C"^ , since otherwise we would 
have by Lemma 7 that there is a bridging node 6' P — {ik-,i')- But 
/3 would then be a (io, . . . , ijt-i, ife) centinode in (r, t..t'), contradicting 
our assumption. 

We consider two cases. First suppose that (io,t) € . Given that 
(«o,i) 7^ {ik,t') we have that (7^™"^,r,^') ^ is:iJoccurred(e) A ND(e)) 
by Theorem 5 above, and since Ki^_Kii^_^ ■ ■ ■ Ki^cf) validly implies Ki^(f> 
is valid, we obtain that {R^"'^ ,r,t') \f Ki^Ki^_^ ■ ■ • ifj^ (occurred (e) A 
ND(e)), as claimed. 

Next suppose that (io, ^ C"". Let r' G 'j^max ^ j,^^ such that r' is 
identical to r until (but not including) time t, and where 

(i) the environment's actions at all nodes in past(r, {ik,t')) are iden- 
tical to those in r; and 

(ii) all messages delivered to nodes not in past(r, (i^, i')) are deliv- 
ered at the maximal possible transmission time according to the 
bounds maxy. 

To see that such a run r' indeed exists in TZ'^"'^, we note that clause 
(ii) relates to nodes outside past(r, (i^, t')) and thus by assumption 
does not contradict clause (i), and that by definition all early message 
receives can be delayed, independent of the run's past or concurrent 
events. Since TZ^"-^ contains all runs of P in 'y'"^^, it must include r'. 

Showing that such a run exists repeats the arguments in Theorem 5. 
From r' being identical to r until time t, from clause (i) above and from 
Lemma 6 it follows that r'-^lt') = ri^{t'). Notice that by construction 
of r' we have that a a' holds in r' only if a ^ a' in r, and that every 
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early receive in r' is an early receive in r. Considering that bounds 

are universal in all runs, we obtain that every bridge node in r' is also 
a bridge node in r, and hence that C" C C. By definition of r', 
none of the nodes in the set C, and hence also in C"" , experiences an 
early receive in r' . Yet from Lemma 8 and from (?o, t) ^ C" it follows 
that every node 9' G C"" must be a nontrivial bridge node in r', thus 
experiencing an early receive. We therefore conclude that C" = 0. 

Based on the inductive hypothesis and the definition of C^' we get 
^^max^ /, t') \f Ki^_^ ■ ■ ■ i^i, (occurred (e) A ND(e)). 

As Tif, {€) = {€) , we obtain using the definition of Knowledge operator 
that 

(7^'""^ r, i') ^ • • • i^ii (occurred(e) A ND(e)), 

and we are done. 

^Theorem 6 

Based on the Knowledge Gain Theorem, we can proceed to prove The- 
orem 4, the Centipede Theorem. For protocols that recall responses, the 
Centipede theorem follows immediately from Theorems 6 and 1. We prove 
the Centipede Theorem for arbitrary protocols by relating the general case 
to that of protocols that recall responses, as we did in Theorem 3. 
Proof of Theorem 4 Let P be a protocol solving OR = OR(et,Q;i, ..,ajt). 
Define P' to be a protocol that differs from P only in that every process i 
maintains a list called Responses, to which it adds an item {afi,tfi) when- 
ever i perfoms a response ah- (We assume w.l.o.g. that no list with this 
name is used by P.) Notice that this list is an auxiliary variable that does 
not affect the behavior of the protocol. Indeed, there is an isomorphism 
between the runs of P and those of P' , where the same nondeterministic 
events and the same actions take place at all nodes of corresponding runs. 
In particular, since P solves OR, then so does P'. By construction, P' recalls 
responses. Denote TZ' = TZ{P' , j"^^^) . Assume that Ct occurs in the run r 
of P and let r' be the corresponding run of P' . Since P' recalls responses, 
we have by Theorem 1 that 

(7^',r,^fc) h Ki^Ki^_^ •• -i^i, occurred (et). 

Since Ct is an external input in all runs of TZ' , we have that TZ' N occurred (et) =^ 
(occurred(et) A ND(et)). Hence, 

(7^^r,^fe) h Ki^Ki^_^ - ■ ■ Ki^{occurre6{et) hm{et)). 
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By Theorem 6 we thus obtain that there is a centipede for (io, ■ ■ ■ ,ik) in 
{r',t..t'). Since all actions and communication events in r and in r' are the 
same, it follows that there is a centipede for {iq, . . . , ik) in (r, t..t'), and we 
are done. ■ 



3.8 Varying Nondeterminism in Message Trans- 
mission 

A better grasp of the dynamics and flexibility of the centipede structure, and 
of the scope of the related Knowledge Gain Theorem, is afforded by con- 
sidering two particular models, with very different characteristics. On one 
extreme, we define the Asynch-delivery model to be one in which maxy = oo 
for all channels i i->- j. On the other extreme, we consider the Fixed-delivery 
model to be one in which every message on a channel i ^ j spends exactly 
maxij < oo time units in transit. 

We define the Asynch-delivery model as a context 7^^ which is a 
context where maXy = 00 for all existing channels. This is a model where 
processes share a global clock but communication is asynchronous. In this 
case, clause (3) in the definition of syncausality cannot be used to infer 
syncausality of any pair of nodes, and thus coincides with -». 

In this model, the Knowledge Gain then reduces to a theorem equivalent 
to Chandy and Misra's Knowledge Gain Theorem for totally asynchronous 
contexts [7]. 

Lemma 10 Let P be an arbitrary protocol, let TZ^^ = 7?.(P, 7^^). Assume e 

is an ND event occurring at {io,t) in r. 

If(n^^r,t') N •••K,,(occurred(e) AND(e)), 

then there is a chain {io,t) ^ • • • {ik-,tk) in {r,t..t'). 

Proof By the theorem's assumptions and by applying the Knowledge Gain 
Theorem, we obtain that there must exist a centipede 
((io,t),6'i, ..,6'fc_i, {ik,t')) for {io,ii, ..,4) in {r,t..t'). 

Yet when 6 = 00 for all channels, we get that if (z, t) ^ {j, t') and 
t < t' < 00 then the syncausal relation cannot be based on applications of 
clause (3) of the definition of syncausality on page 35. Thus, it must be that 
{j,t')- Moreover, \it <t' < 00 and --->■ then it must be 

that i = j. 

Thus in the existing centipede it must be that for all h < k, 9^ = {ifi, t^) 
and Oh -» Thus providing us with a message chain linking io,ii,..,ik 
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{ii,tk) 

Figure 3.12: Collapsed centipede variations 



in {r,t..t'). U 

As seen in Figure 5.1, under the Asynch-delivery model the centipede's 
legs are shortened to length 0, and the syncausal relations between it's body 
nodes collapse into Lamport's happened-before. 

The Fixed-delivery model runs opposite to the Asynch-delivery one in 
removing the nondeterministic aspect in message delivery. Thus, not only 
do processes share a global clock, but they also share knowledge of the exact 
time it takes each message to be delivered. 

We define this model as a context where miriij = maxij for all chan- 
nels, and denote it j^. Under this model, every message sent arrives exactly 
at its related channel's bound guarantee. The syncausal relation then ac- 
quires the same extension as that of the timing guarantee, and centipede 
body nodes are collapsed into a single node, as shown in Figure 5.1. 

Lemma 11 Let P be an arbitrary protocol, and let TZ^ = TZ{P,'y^). 
Fix r and assume e is an ND event occurring at (zq, t) in r. 
If (7^^ r, t') N Ki^Ki^_^ ■■■Ki, (occurred(e) A ND(e)), 
then {io,t) — ■> {ih,t') for all h < k. 

Proof Note that in the context we have that (i, s) {j, s') iff (i, s) — > 
(j, s'), for all processes i,j and times s, s'. By the Knowledge Gain Theorem 
there must exist a centipede for {io,ii, in (r,t..t'). In other words, for 
each h < k there exists some 6h such that {io,t) Oh — -> {ih-,t')- Thus we 
get that (zo,i) — Oh — -> {ih,t') and hence {io,t) — -> {ih,t') as required. ■ 
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3.9 Sufficiency of Centipedes for Knowledge Gain 



The Knowledge Gain and Centipede Theorems (Theorems 6 and 4 respec- 
tively) state that the centipede structure is necessary for gaining nested 
knowledge occurrence of nondeterministic events and for solving the OR 
problem. These results hold in a strong sense, regardless of the protocol 
used by the processes. Our goal in this section is to show that these results 
are tight. 

We cannot prove that centipedes are sufficient means to achieve these 
ends for all protocols, because the knowledge actually transferred by mes- 
sages depends on the protocol, and may be insufficient.*^ The most we can 
do in order to prove the tightness of our definitions is to show that there 
exist specific protocols under which centipedes are sufficient for knowledge 
gain. We will do so for the following version of the full information protocol. 

Definition 14 (Full-information Protocol) In the full information pro- 
tocol for synchronous systems, denoted fip, every process i G P sends its local 

state on each of its outgoing channels at every time step. Moreover, each 
process retains a history of every event that has taken place locally, and every 
message received, along with their times of occurrence. 

We will denote with the system 7?.(fip, 7"^^'*). In fip the processes 
convey all of their knowledge as fast as they can. Roughly speaking, knowl- 
edge is spread in the system as fast as possible, given the transmission times 
allowed by the environment in the given run. While our stated goal is to 
prove that under fip the Knowledge Gain and Centipede Theorems are tight, 
our results will be somewhat stronger. These theorems show that centipedes 
are necessary for knowledge gain regarding nondeterministic events. As we 
shall see, in the context of fip, there is no need to restrict attention to non- 
deterministic events. The causal structures in question are sufficient for 
knowledge gain regarding general events (and more general facts). 

Three simple but very useful properties of the timestamping operator 
Att are captured by the following immediate lemma: 

Lemma 12 For every formula ip and times t, t' , the following formulas are 
valid in 7^™"^ 

TSl ^ At,(y- At,y") 

''See [42] and [34] for some interesting observations on the connections between proto- 
cols and message meanings. 
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TS2 N Att'Att(^ -H- ktt^p 



TS3 N Att(i^i(/? o i^iAtt99) 

Notice that, as described, processes following fip have perfect recall. 
Since they maintain their local histories, they do not forget what they knew. 
Of course, the truth of a transient fact can change over time. Knowing that 
the time is 3 is possible at time 3 but not at time 4. But whether ip held 
at time t does not change. More formally, perfect recall and the presence of 
clocks give us the following knowledge-preservation property, which states 
that if at time t process i knows that then at every future time point the 
process will know (or remember) that it knew at t. 

Lemma 13 If t' > t, then the following formula is valid in TZ^'^: 
TS4 N AttKi(p^ Att'KiAttKi^. 

The proof of the following lemma will make use of the relation, 
which results from a single application of one of the clauses (1), (2), or (3) 
of syncausality. 

We are now ready to show that syncausality alone is sufficient to ensure 
knowledge transfer under fip. Figure 3.13 below provides a graphical visual- 
ization of what Lemma 14 shows. Namely, that if Knp holds at time t and 
(i, t) {j, t'), then at t' process j knows that at time t process i knew that 
cp. 

Lemma 14 // r, t) 1= Ki(p and {i,t) {j,t') in r, then 




1 

Figure 3.13: The syncausal relation and reflected knowledge state 
Proof Given that {i,t) {j,t'), we have that 

{i, t) = (jo, so) Ul, Sl) • • • (jn, Sn) = (j, t') 

We prove the claim by induction on n. 
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n = : In this case, j = i and t = t'. Positive introspection gives us 
(7^f'P r, t) t= KiKiip. Let ip' = Kiip, and apply TS3 based on (7^f'P r, t) t= 
i^jC/?'. This gives us r, t') N KiAttKiip as required. 

n > : Assume inductively that (7?.^'p r, 1= Kj^ ^ (At(Kj(/}). By def- 

inition of the sequence we have that (j„_i,s„_i) -^^^ {jn,Sn)- By 
definition of -^^^ there are three options to consider, corresponding to 
clauses (l)-(3) of syncausality: 

1. jn-i = jn and Sn-i ^ Sn- In this case by TSi we have that 

(7^f'P,r,s„_l) N Ats„Kj^Ats„_,Kj^_,{AttKiip), 
which is reduced to 

Applying the Knowledge Axiom and TS2 to every (r',s„) 
(r, s„) we obtain (T^^'p r, s„) h Kj^{AttKnp). 

2. process jVi-i sends a message in round Sn-i, which is received 
by jn in round Sn- Since the protocol used is fip, message con- 
tents consist of the local state of sender. Based on the inductive 
assumption wc get that (7?,^'p r, s„) N K j^Ats^_^K j^_^{A\.tKiip) . 
This is implies (T^^'p, r, s„) 1= Kj^{AttKi(p) as in case (1). 

3. {jn-i,jn) is a network channel and no message is sent by jn-i to 
jn at time s„: Since in fip every process sends its local state to 
all neighbors in every round, this option is not viable in r. 



The lemma is proved based on the fact that in fip processes constantly 
send explicit messages on all outgoing channels, so a syncausal chain in fip 
never contains a link that is based on clause (3) of syncausality. Hence, if 
{i,t) {j,t') in r, then there must exist a chain of "real" messages linking 
the two nodes. Messages in fip contain the local state of the sender. Hence, 
i's local state at time t is propagated through the message chain until it 
reaches j. 

Lemma 15 below makes use of the further guarantees made by the 
relation. Recall that {i, t) --■>■ (j, t') implies that (i, t) (j, t'), by Lemma 3. 
Moreover, The --■ > relation is determined by the context -y"^^*^ alone. So that 
if {i, t) (j, t') holds in a run r of the system, it will do so in all runs of the 
system. Thus, process i knows already at time t that its current knowledge 



53 



will be available to j at t + D{i,j). This situation is depicted in Figure 3.14. 
Translated into English, the figure shows that if at time t process i knows 
that (f, and if {i,t) {j,t'), then at time t process i also knows that at 
time t' process j will know that at time t process i knew that (p. 

Lemma 15 // (7?-^'p r, t) 1= Kitp and {i,t) --■> {j,t'), then 



i'.t) 

causal 



structure 



knowledge Ty^ 1 f 1 ^ 

state -'^'l I X-\ 



Figure 3.14: The timing guarantee and reflected knowledge state 



Proof Since (j, since this property is determined by the 

network independently of the particular run r, wc have that {i,t) (j, t') 
in every run r' G 7?.^'^. Moreover, by Lemma 3 wc have that (i, t) ^ (j, t') in 
every such run. Applying Lemma 14 to every run r' such that (r', t) ~i (r, t) 
we obtain that {n^'Pr',t') 1= Kj{AttKiip). By TSl we obtain (7^f'P, r', t') 1= 
Atf Kj(AttKiLp) . By choice of runs r' we now conclude that (7^f'P,r,^) t= 

Lemmas 14 and 15 capture essential epistemic aspects of the fip in the 
synchronous context ^"^^^^ based in part on perfect recall. Composing them 
gives us Lemma 16, which is at the heart of the proof of the sufficiency 
Theorem 7 below. The causal and epistemic states described by the lemma 
are shown in Figure 3.15. 

Lemma 16 // (7^f'P,r,^i) N Kiip and {i,ti) {j,tj) --^ {i,te) 
then {n^''P,r,tj) 1= KjAu^K^Au^Ki^ . 

Proof Since (7^^'p, r, tj) 1= Ki({) and ij,tj), Lemma 14 gives us 

(7?.*'P r, tj) N KjAttfKiip. Now as {j,tj) — -> {£,t£), using Lemma 15 we get 
(7^*''P,r,^j) (= KjAttfKiAtt^A%Ki(p. Finally, applying vaUdity TS2 reduces 
the result to (7^^'p, r, ij) N KjAtt^K^Att^Kiip as required. ■ 

We are now ready to prove that in fip, the existence of a centipede is 
sufficient for nested knowledge gain. 
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causal ^ "■ - 

structure " 



Figure 3.15: Syncausal relation and timing guarantee, with induced knowl- 
edge state 

Theorem 7 // {TZ \ r, t) ^ K^^Lp and there is a centipede for {io, . . . , ik) in 
(r, t..t'), then (7^f'P r, t') N Ki^Ki^_^ ■ ■ ■ Ki,Ki,{AttKi,<^) . 

Proof Let {{jo, to), .., {jk, tk)) be a centipede for {io, . . . , Zfe) in (r, t..t'), such 
that {io,t) = {jo,to) and {jk,tk) = {ik,t'), as seen in Figure 3.16. 




(Ji.ii) 

Figure 3.16: Centipede for Theorem 7 

We show by proceeding inductively on each "body" node {jhith) for 
< < A; that {n^''^,r,th) N Xj-JAtt/i^i^ir^^,, . . . Kio(AttKio(^)). RecaU 
that the global time t appears as a component of all local states. Thus, 
{r^,t^) (r^,t^) is possible only if = t^. 

h = 0: As {io,t) --^ {io,t'), applying Lemma 15 to assumption (7?.^'p r, 1= 
Kigif gives us (7^f'P r, t) 1= Ki^klfKi^kUKi^^p. Since io = jo we get 

h> 0: Assume for /i — 1 and show for h. For clarity, define 

^h = Ki^---Ki^{^ttKi^^). 
The inductive assumption gives us that 

{n''P,r,th-i)^ Kj,_,{Att'^h-i). 
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By definition of the centipede we have that (jh-iTth-i) ~^ {jhi th) and 
that {jh,ih) — {^h-i't')- By Lemma 16 we have that 

Using vahdity TS'S we reduce this to 

which is the required r, i/j) 1= Kj^^Atf^^h}- This concludes the 

inductive proof. 

In particular, we obtain r, t') 1= Ki^^ {IKtt'Ki^Ki^_^ . . . Ki^, (AttKi^^cp)) 

since jk = ik and = t' . Using the Knowledge Axiom we obtain that 
(7^f'P r,t') N Atf Ki^Ki^_^ ... K^,iAttK^,^). Finally, using TSl, (7^f'P,r,^') N 
Ki^Kii^.i ■ ■ ■ Kig{AttKigip) as desired. ^Theorem 7 

Theorem 7 proceeds by tracing the knowledge states of the centipede's 
"body" nodes. ^ These nodes provide a communication path that "feeds" 
the endpoints 11,12, etc. A subtle point is that each body node 6h already 
knows that its related endpoint ih will know by t' what it (i.e. 9h) knows. 
This is information that Oh can also pass on to the next body node Oh+i- 

We obtain the following result by immediate application of Lemma 1 to 
Theorem 7. 

Theorem 8 (Nested Knowledge Sufficiency) Let P be an fip protocol 
that is also non-hesitant for OR = (ct, ai, . . . , ak). If for every r G T?.'''' = 
Tl{P,^'^^^) in which e is an ND event at {io,t) there exists time t' such that 
a centipede for {io, . ■ ■ ,ik) exists in {r,t..t'), then P solves OR. 

3.10 Conclusions 

This chapter starts out by introducing and discussing several new concepts 
related to causality in synchronous systems. Thus, the hound guarantee 
and syncausality relations lead up to the centipede structure. Then the 
formal theory is developed that results with the Knowledge Gain Theorem, 
and thence the Centipede Theorem. Finally, it is shown that the causal 
structuros dc^fined arc tight, in the sense that there exists a protocol where 

■'The knowledge state of a node is a convenient abuse of language, that refers to the 
knowledge state of the process related to the node, at the time related to the node. 
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the existence of a centipede is sufficient for knowledge gain and for solving 
the OR problem. 

Our results all hold in particular in the case in which raaxy = oo for all 
channels, so that communication is asynchronous (although processes share 
the global clock and can move at each step). Because communication is 
asynchronous, bound guarantees are useless in this setting. Syncausality 
reduces to Lamport's happened-before, and all possible centipedes collapse 
to message chains. Thus, our results also apply to such contexts, reproving 
Chandy and Misra's Knowledge-gain theorem in a slightly more general set- 
ting. Asynchrony of communication alone suffices for this type of implosion. 

How knowledge actually evolves in a system will depend on the par- 
ticular protocol used. As a first study of the role that protocols play in 
determining information flow in the synchronous contexts -y'"^", we have 
analyzed the full-information protocol and have shown that the definitions 
for syncausality and centipede arc not only necessary but also sufficient for 
nested knowledge under such protocols. If one adds the non-hesitance prop- 
erty then the causal structures also suffice for solving the OR problem. It 
follows that our characterization of coordination in terms syncausality and 
centipedes is, in a precise sense, tight. 



57 



Chapter 4 

Gaining Common Knowledge 



4.1 Introduction 

This chapter analyzes the causal relations that lead to common knowledge 
gain and to simultaneous coordination. A well-known result [23] shows that 
common knowledge cannot be gained in asynchronous systems. Common 
knowledge can, however, be gained in synchronous ones. As such, the results 
in this chapter have no counterpart in asynchronous systems. The state of 
common knowledge has been shown to play an important role in agreements 
and in coordinating simultaneous actions [23, 14, 15]. 

As before, we provide a more concrete motivation for our investigation 
by considering the Simultaneous Response problem, defined in Section 2.1. 
Consider the scenario depicted in the following example. 



Example 5 The Wikileaks whistle blowing site is about to uncover yet an- 
other state secret. It strikes a bargain with El Pais and The New York 



New York Ti\ 





Figure 4.1: The network of Example 5 



58 



Times. As soon the secret becomes available to Wikileaks (the exact timing 
depends upon an external source and is thus unknown), it will pass on the 
information to the papers using time stamped messages. The contract with 
Wikileaks states that both papers are to publish the scoop simultaneously, 
or not at all. The parties involved communicate over the network shown in 
Figure 4-1- Note that the scenario sketches out an instance of SR where a 
.spontaneous event at Wikileaks is to be followed by a pair of simultaneous 
publication events. 



NYT 



Wiki 



ElPais 




NYT 
Middle 
Wiki 
ElPais 



CK 




t t+3t+5 t+10 
(a) Publishing simultaneously at t + 10 



t t+3 t+7t+9 
(b) Publishing simultaneously at t + 9 



Figure 4.2: Example 5 

Suppose that the secret becomes available at time t and that Wikileaks 
sends messages to the NYT and El Pais right after (let's keep the Middleman 
out of it for now). 

In Figure 4-2o, Wikileak's messages to the NYT and El Pais arrive at 
times t + 3 and t + 5 respectively. The editors both wait until t+10 before 
simultaneously publishing the secret. 

Figure 4-^b offers an alternative scenario. Here the Middleman is also 
notified by Wikileaks, and it sends on messages to both papers. Despite the 
fact that the messages sent by Wikileaks to the papers both arrive by t + 3, 
and that the Middleman's messages arrive by t + 8, the papers must wait 
until t + 9 in order to ensure simultaneous publication of the scoop. 

□ 



Recall that, given Theorem 2, the simultaneous response requirement is 
reduced to a requirement for common knowledge of the occurrence of the 
ND event. Example 5 is thus best analyzed in terms of knowledge gain. 
In Figure 4.2a, as soon as the message to El Pats arrives, we have that 
i^^fiSecret A KArSecret. But we also have KEKNSecret, since the Spanish 
editor can work out that send time was t and that a message to the NYT 
will have arrived by t + 3 at the latest. Ki\iKESecret does not hold however, 
because the message to El Pais may take longer than 5. By waiting until 
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Figure 4.3: A centibroom 



t + 10 we also have Kj^KESecr&t. Since both bounds have been reached, and 
since the bounds are common knowledge, we also get that K eK K ESecret, 
Kj^KEK^KESecret, etc. As this ever lengthening nesting of knowledge 
points out, at t+10 the group of papers {NYT, El Pms} has gained common 
knowledge of the secret, C{£;jv} Secret. 

In Figure 4.2b similar calculations will convince the reader that, based on 
the Middleman's messages, common knowledge arises already at t + 9. Note 
that mutual knowledge (i^eSecret AifjvSecret, established at time t + 3) and 
even mutual nested knowledge {K^KESecret A KEK^Secret, established at 
time t + 7) do not necessarily lead to common knowledge. For example, at 
t + 7 KEKiyKESecret does not hold: the Spanish editor is thinking that 
as far as the editors in New- York are concerned, a message from Wikileaks 
to El Pms could arrive as late as t + 10, and that the message from the 
Middleman may not have arrived in New- York as yet. 

4.2 Centibrooms 

As illustrated above the existence of a centipede, even under the best of 
terms where messages contain all relevant information, may not suffice for 
ensuring common knowledge gain. The analysis suggests that it is only 
when a node exists from which messages are guaranteed to have arrived at 
the sites of all parties concerned, that common knowledge may arise. 
We now define a communication structure that echoes this intuition. 

Definition 15 (Centibroom) Let t < t' and G C P. Node 6 is a centib- 
room for {io,G) in {r,t..t') if (^o,^) ^ and 9 — ■> {ih,i') holds for all 
ih e G. 

The centibroom node 9 is syncausally connected to the originating node 
of the nondeterministic event, which enables it to be informed of the event's 
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occurrence. Node 9 is also connected by bound guarantees to the time t' 
nodes of all processes in G. Intuitively, this makes it possible for 6 to guar- 
antee that a message sent to any ih € G will have arrived by t' . Note that, 
once again, Figures 4.2a and 4.2b contain centibroom structures (in both 
figures all communication that is not a part of the centibroom is dimmed 
out). 

The Centibroom Theorem, formulated below and proved in the next 
section, shows that indeed in order to coordinate a simultaneous response, 
a centibroom must exist that connects the responding sites to the triggering 
one. The Centibroom Theorem can be seen as an extension of the Centipede 
Theorem that applies to the SR problem. 

Theorem 9 (Centibroom Theorem) Let P be a protocol solving SR = 
(ct, CKi, . . . , afe) in j'^^'^^ and assume that e^ occurs at (zq, t) inr E TZ^"'^. If 
the response actions are performed at time t' in r, then there is a centibroom 
{io,G) in {r,t..t'). 

4.3 Common Knowledge Requires Centibrooms 

Clearly, centibrooms are simpler structures than general centipedes. Notice, 
however, that a centibroom for G = can be considered as a 

condensed representation of infinitely many centipedes, each of which can 
support knowledge gain of a particular formula. More concretely, we have 
the following. 

Lemma 17 Let G C F, and let 9 be a centibroom for {io,G) in {r,t..t'). 
Then for every sequence G G'' of processes in G, the sequence 

(zo, t) ■ 9^ (where 9 repeats k times) is a centipede for {io, ■ ■ ■ ,ik) in (r, t..t'). 

Proof Fix a sequence C = (ii, . . . , ifc) G G^. The sequence 
fe-i 

{{io, t),9- ■ - 9, {ik, t')), with k—1 repetitions of 9, is a centipede for C, since: 

• 9 is a centibroom for {io, G), so (io, t) 9, and 

• 9 — ■> {ik,t') implies 9 {ik,t'), and 

• 9-^9 due to reflexivity of finally 

• is a centibroom for (io, G) so 9 {ih, t') for all 1 < /i < — 1. 
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Notice that Lemma 17 does not bound the value of k, nor does it restrict 
the possibihty of repetitions in the sequence {ii,. . . , ik) in question. We are 
now ready to show that the centibroom serves as the structure underlying 
common knowledge. 

Theorem 10 (Common Knowledge Gain) Let P be an arbitrary pro- 
tocol, let G CF, and let r € T?.™""^. Assume that e is an ND event at {io,t) 
in r. Ift'yt and {TZ"^°-^ ,r,t') t= Cg ((occurred(e) AND(e))); then there is a 
centibroom 6 for {iQ,G) in {r,t..t'). 

Proof Assume the notations and conditions of the theorem. Denote G = 
{ii, . . . ,ifc} and d = t' -t. Since (7^'"«^, r, t') 1= CG(occurred(e) A ND(e))) we 

have by definition of common knowledge that 

occurred(e) A ND(e)). In particular, this implies that 

(7^"*'^^r,^') N (i^i, •• -i^i (occurred (e) A ND(e)), 

where {Ki^, • • • Ki^)'^'^^ stands for d+1 consecutive copies of Ki^, ■ ■ ■ Kj^. By 
the Knowledge Gain Theorem 6, there is a corresponding centipede a = 
{eo,ei,...,ek(a+i)) m {r,t..t'). Denote Oh = {ihM) for all < < fc-(ci + 1). 
Recall that, by definition, Oh+i holds for all /i < k-{d + 1). By 

Lemma 4 we obtain that if 9h ^ ^/j+i then th < th+i- It follows that there 
can be at most d+1 distinct nodes ai a2 • • • in a. Every 

ah represents a segment 9x, ■ ■ ■ , Ox+s of the nodes in a. By the pigeonhole 
principle, one of the a's must represent a segment consisting of at least k of 
the 0's in a. Denoting this node by a, we obtain that a [i^, t') for every 
ih € G. Moreover, by definition of the centipede and transitivity of ^ wc 
have that (io, t) a. It follows that a is a centibroom for (io, G) in (r, t..t'). 

^Theorem 10 

The proof of Theorem 10 is based on the Knowledge Gain Theorem 6. 
Recall that Cg^ implies arbitrarily deeply nested knowledge of (p. Ev- 
ery such nested knowledge formula implies the existence of a centipede. A 
nested knowledge formula is constructed whose centipede has sufficiently 
many nodes that at least one of them must be a centibroom for G aX, t' . 

In Chapter 3 we defined the centinode, which is an instance of the cen- 
tipede whose every "body" node is a bridge to the related "leg" node. We 
now similarly identify and prove the existence of a bridging centibroom. 

Definition 16 (Bridging centibroom) Let t <t' and G C P. Node 6 is 
a bridging centibroom for {iQ,G) in {r,t..t') if 
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• 9 is a centibroom for {io,G) in {r,t..t'); and 

• 9 bridges (io, t) and g for every g E G. 

Lemma 18 Fix r € j^^"-^ ^^(1 assume that 6 is a centibroom for {io, G) in 
(r, t..t'). Then there exists a node 9' that is a bridging centibroom for {io, G) 
in (r, t..t'). 



Proof By Lemma 7 there exists a node "0 such that '0 bridges {iQ,t) and 
9. Node is a bridging centibroom since 

• {io, t) ijj — ■> 9 — ■> g impUes {io, t) if) — ■> g for all G G 

• a --^ tp' -—^ tp implies tp' = ip hy definition of bridge. 



Theorem 10 shows that common knowledge can arise in synchronous 
systems only when there exists a centibroom structure, centered about the 
centibroom node. The above Lemma 18, together with Lemma 8, points 
out that there must exist a bridging centibroom for the group, in which a 
nondeterministic pivotal event, either an early receive or possibly an exter- 
nal input when ip = {io,t), occurs. This demonstrates that the nature of 
common knowledge is finitistic, despite its familiar definition being based 
on an infinite conjunction of facts. This phenomenon is consistent with the 
analysis of common knowledge in the work on fault-tolerance [13, 36, 32]. 
There, too, common knowledge arises at some time t' exactly if there is some 
property 5* of the correct nodes that ensures that all processes will know by 
time t' that the property S held in the run. 

We remark that Theorem 10 relates to a familiar situation involving 
the evolution of knowledge in broadcasts. In a flooding protocol or a ra- 
dio broadcast, for example, the contents being broadcast become common 
knowledge to a growing set of participants with time. Typically, after a time 
interval equivalent to the diameter of the system, the contents can become 
common knowledge to all processes in the system. 

The proof of Theorem 9 is now immediate: we show that the existence of 
a centibroom is a necessary condition for solving the Simultaneous Response 
problem by applying the Common Knowledge Theorem 10 to Theorem 2. 
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4.4 The Simultaneous Global Snapshot Protocol 



Before exploring further the theoretical implications for the centibroom 
structure, we pause to consider a possible application.^ 

A well known application for Lamport's causal relation is the global snap- 
shot algorithm, proposed by Chandy and Lamport in [6]. This algorithm is 
used to record a consistent global state in asynchronous systems. A global 
snapshot of the system at a given run r and time t, which we will denote 
with Snap{r,t), consists of records of the local states of all processes in 
the system, and of the communication channels, at that point in the run. 
Technically, communication channels do not posses a memory, component 
so their state must be reconstructed by the processes. Interestingly, the 
Chandy-Lamport algorithm cannot ensure that the global snapshot that it 
actually records is in fact a global state in the current run. No protocol can 
grant such assurances in an asynchronous system. Rather, the algorithm 
ensures that the recorded snapshot is consistent with the current run in the 
following sense: 

Definition 17 (Snapshot consistency) Fixr G Tl{P, 'j) for arbitrary pro- 
tocol P and context 7. Snapshot S* is consistent with the interval [ts,te] of 
r if there exists r' G 7?.(P, 7) and times t'g < t'^ < t'^ such that 

1. Snap{r,ts) = Snap{r' ,t'g), 

2. S* = Snap{r'X), and 

3. Snap{r,te) = Snap{r' ,t'^) . 



Mechanisms for recording global states come in useful, for example, in 
association with recovery from system failure. In fact, many applications use 
such algorithms in order to retain "checkpoints" : global states that can be 
"rolled back" into, when failure occurs [38]. The Centibroom Theorem sug- 
gests a synchronous variation for Chandy and Lamport's original algorithm. 
We will actually consider two variants: the first being message optimal, and 
the second providing time optimization. 

When activated, the Simultaneous Global Snapshot Protocol results with 
all processes simultaneously recording their local states at a time t, and all 
messages that are in transit on inbound communication channels at that 

'^We thank Gadi Taubenfeld for suggesting this apphcation to us. 
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time. Observe that given the synchronous nature of the system, simultaneity 
is a necessary requirement for achieving a consistent global state. Allowing 
two processes i and j to record their local states at ti and tj respectively, 
where \ti — tj\ > may, in the general case, result in an inconsistency: it 
may be the case that there are no possible global states that includes the 
local states defined by {i,ti) and {j,tj) both, due to simultaneous actions 
that are always performed by i and j together at some time ti < t' < tj. 
Summing up, if snapshot S* is consistent with the interval [ts,te] of run r, 
then there exists some time U G [ts,te] such that S* = Snap{r,t^). 

The algorithm is quite simple. We mark with Diameteri the distance 
of the process j furthest from i, when measuring based on Dij. We assume 
that the protocol may be initiated (from the outside) at any process in 
the system, or even in several places in the system. Algorithm 1 shows 
the protocol's pseudo code. The (arbitrary) initiator node (i, t) floods the 
system with initiate messages, that indicate time t' = t + Diametevi as the 
time at which the "snapshot" must be taken. By deflnition of Diameteri, 
these messages arrive at all sites by time t'. At t' every process j records 
its own local state, and starts recording incoming communications on each 
of its inbound channels. Recording the channel h ^ j takes place from 
time t' , until t' + maxhj, but only messages that are not marked with an 
extra "ignore" bit are recorded. Apart from carrying on these recordings, 
the processes are free to carry on with their (non snapshot related) tasks. 
However, if these tasks demand that a process j send a message on some 
outbound channel j ^ h prior to time t' + maxjh, then this message is 
marked "ignore" by appending an extra bit set to 1 to the message. 

Note that a different mechanism could be employed for the purpose of 
recording the contents of communication channels. Rather than starting to 
record upon snapshot, the alternative mechanism would have each process 
constantly keeping a long-enough tail on its history so that when snapshot 
occurs at time t, for each channel i ^ j, process i can recount all messages 
sent on the channel which may, potentially still be en route. Those would 
be all messages sent after t — maxij. At the price of greater stress on 
memory resources, the algorithm would complete the snapshot recording 
faster. Although in order to gain a complete picture of the state of the 
channel i i— t- j, we would have to further compare the local states of i and 
j at the snapshot time. For this reason we opt for the version presented 
below, its simplicity being better suited for our explanatory purposes. 

The following lemma proves the protocol's correctness. 

Lemma 19 Choose r G -fK^psnap-i ^^max^ where snapshot initiation occurs 
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Algorithm 1 Simultaneous Global Snapshot - P*""p ^ 



1: procedure Initiator node {i,t): 

2: snapshot ^ t + Diameteri 

3: for all outgoing channels i ^ h do 

4: sendh{initiate{snapshot)) 

5: procedure Arbitrary node {j,t'): 
6: if receive initiate(S) then 

7: snapshot ^ S 

8: for all outgoing channels i ^ h do 

9: sendh{initiate{snapshot)) 

10: if t' = snapshot then record local state 
11: for all incoming channels /i i-> do 

12: receive msg on channel 

13: if snapshot < t' < snapshot + D(h,j) A msg. transparent 

then 

14: record msg 



at {i,t). Then there exists a time t' > t + Diameteri where each process j 
contains 

1. record of its local state at time t + Diameteri, o-nd 

2. record of incoming messages en route at time t + Diameteri ■ 

Proof That all process local states are simultaneously recorded at t' is 
straightforward from the definitions. That exactly those messages that were 
in transit at time t' are recorded can be seen by noting first that all messages 
in transit on channel h ^ j at t' are guaranteed to arrive by time t' + max^j , 
at which point recording on that channel stops. Moreover, messages sent 
after t' but which arrive at j before t' + maXhj will be marked transparent 
and will not be recorded. Thus, the algorithm is correct in recording the 
global state at time t'. ■ 

The algorithm is straightforward. A revised version of the algorithm 
can ensure time optimality. The protocol starts the same, with the initiat- 
ing node {i,t) flooding the system with initiate messages bearing the value 
t+Diameteri- However, in this version, every process j that gets such a mes- 
sage at time t' checks to see whether it can ensure an even quicker simultane- 
ous recording response, i.e. whether t' + Diameterj < t+Diameteri. If so, it 
will start to flood the system with initiate messages bearing t' + Diameter j . 
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Algorithm 2 Simultaneous Global Snapshot - P^^"'P ^ 

1: procedure Initiator node {i,t): 
2: snapshot t + Diametevi 
3: for all outgoing channels h do 
4: sendh{initiate{snapshot)) 

5: procedure Arbitrary node {j,t'): 

6: if receive initiate(S) then 

7: if S <t' + Diameter j then 

8: snapshot <— S 

9: else 

10: snapshot ^ t' + Diameter j 

11: for all outgoing channels z i->- /t do 

12: sendfi{initiate{snapshot)) 

13: if t' = snapshot then record local state 
14: for all incoming channels h^ j do 
15: receive msg on channel 

16: if snapshot < t' < snapshot + D{h,j) A msg. transparent ^ 

then 

17: record ms^ 
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Lemma 20 Protocol p*"«P-2 following two properties: 

Correctness: It is correct. 

Optimality : No other protocol can ensure a shorter delay between initiation 
and time of snapshot. 

Proof 

Correctness: Fix a run r G 'fi(^psnap-2 ^ ^max^ where initiation of snapshot 
algorithm occurs at 60 = {io,to), setting snapshot time for = to + 
DiameteviQ. If no shorter term initiate messages are issued within the 
interval [foj^o] then r is also a p^"^«P-i run, and is thus correct by 
Lemma 19. 

Otherwise, let t'l < tQ be the earliest snapshot time suggested after 
initiation, and let 61 = be the issuing node. As t'^ = ti + 

Diametevi^ and no process issues a shorter term initiate message, 
initiate^t'i) is guaranteed to arrive at all processes no later than t'^. 
Again, as no process issues a shorter term snapshot suggestion, the 
local variable snapshot is equal to t\ at time t'l in all processes. Now, 
based on Lemma 19, the run is shown to be correct. 

Optimality: By the Centibroom Theorem, any protocol in which a simul- 
taneous action on the part of all processes is dependent upon snapshot 
initiation must contain a centibroom for {Oo,¥) where = (^O)^o) is 
the initiation node. Choose a run r ^ TZ = 7^(P*"'"^'~^, 'y"^^'^) initiation 
occurs at thetao and snapshot at tQ. 

Suppose that there exists a centibroom node (ii, ti) for {9, P) in (r, t..t[), 
where t/^ < t'g. Assume without loss of generality that for every t" < t\ 
there are no centibrooms for {9,¥) in (r,t..t"). By definition of cen- 
tibroom, 00 61 and 9i {h,t'i) for all /i G P. At ti or sooner, ii 
receives an initiate{S) message with some suggested snapshot time S. 
Since ti + Diameteri^ = t'l < t'^ < S, and as ii is following 
it immediately starts to flood the system with initiate{t\) messages. 
As no shorter term suggestion is made, by the above proof of the cor- 
rectness of p*"«P-2^ snapshot occurs at t[ < tQ, in contradiction to the 
assumption that snapshot occurs at tp. 

We thus obtain that for every run r G 7^ in which initiation occurs 

at 9q, the shortest interval within which a centibroom can be estab- 
lished is [to, tQ], where tQ is the time at which snapshot actually occurs. 
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As all nodes in fut(^o) flood the initiate messages, there cannot be a 
protocol P' where information about initiation decimates any faster 
than in p«"'*P-2^ ^^^^ hence in particular a ccntibroom cannot be es- 
tablished any faster than in ps""P-2^ ^^^^^ (^g^^y between initiation 
and snapshot is at least as long as it is in p«"«P-2 



4.5 Sufficiency of Centibrooms for Common Knowl- 
edge Gain 

We proceed to show that the centibroom indeed characterizes common 
knowledge gain in sjmchronous systems, in the same way nested knowledge 
gain is characterized by centipedes. We will show that the existence of a 
centibroom is sufficient for common knowledge gain in every 7^ ^'p system by 
using the Induction Rule for Common Knowledge, which states that from 
7?.^'Pt= a ^ Eg{(xA /3) we can infer 7?.*'P|= a — >■ CqP- Importantly, processes 
must now make explicit use of their capability to discern global time in or- 
der to gain common knowledge, due to the essential part played by bound 
guarantees. 

Theorem 11 // (T^^'^r, t) 1= Ki^ip and there is a centibroom node 9 for 
{io,G) in {r,t..t'), then {n*''P,r,t') ^ CG{/\t{ip,t)). 

Proof Assume that the conditions of the theorem hold, and let 6 = 
{j,tj). In particular, {j,tj) --^ {i,t') for every i € G. From (TZ^"^,r,t) N 
Ki^^ip and {io,t) ^ {j,tj) in r we have by Lemma 14 that r, ij) N 

Kj{At{ip,t)). We now use the induction rule with a set to (time = t') A 
At{{Kj{At{ip,t))),tj), and f3 being At{ip,t). Since TZ^'p h a ^ /3 in this 
case, it suffices to show that 1= a — )■ Eca- Thus, let r' G TZ^"^ and fix 
time i. If (T^^'f, r',t) a then a — t- Eca is trivially satisfied in (r, t). Now 
suppose that (TZ^'P,r',i) \= a, giving us that t = t' and thus (7^f'P r',t') 1= 
Att^Kj (Attip). This, in turn, gives us r', tj) N Kj (Atfip) by apphcation 
of TSl (Lemma 12). Fix i & G. Since {j,tj) is a centibroom node, we have 
{j,tj) {i,t'). By Lemma 3 it is also the case that {j,tj) (i,t') in 
r'. Using Lemma 14 we now obtain (T^^'p, r',t') 1= KiAtt^Kj {Att(p). More- 
over, the fact that the time is part of the local state in 'y"^^^ implies that 
(7^f'P r',t') ^ Ki{time = t'). It follows that (7^f'P r',t') N Kia, and since i 
was an arbitrarily chosen member of G then (7^^'p, r', t') 1= Eca. It follows 



69 



that 7?.''Pt= a ^ Eca. Since P = At{(p,t) we obtain by the Induction Rule 
that 7^f'P^ a CG{fi^t{ip,t)). Finally, since (7^f'P r,i') 1= a we obtain that 
(7^^'P r, t') N CG(At(</3, i)), as desired. ^Theorem ii 

In order to relate the centibroom in a fip system to a solution to the 
simultaneous response problem SR, we must tie in common knowledge to 
action. Such a connection is established if we assume that the protocol 
is also considerate with respect to SR (see Definition 21). We obtain the 
following result by immediate application of Lemma 2 to Theorem 11. 

Theorem 12 (Nested Knowledge Sufficiency) Let P be an fip that is 
also considerate with respect to SR = (et, ai, . . . , ak)- If for every r G 7^''p = 
7?.(P, 7""^'^) in which e is an ND event at {io,t) there exists time t' such that 
a centibroom for {io, . . . ,ik) exists in {r,t..t'), then P solves SR. 

4.6 Common Knowledge as a Finite Conjunction 

Common knowledge is typically perceived in terms of an infinite conjunction 
of , for A; > 0. There are also definitions of common knowledge in terms 
of a fixed point (see, e.g., [29, 15, 5]). The centibroom structure and the 
necessity of centibrooms for common knowledge supports the fixed-point 
view: the only way in which a new fact can become common knowledge 
is if there is a singular point, represented by the centibroom node 9, which 
carries the information that is a centibroom for all processes in G at time t' . 
At time t', everyone can become aware of its existence, and the fixed-point 
yields common knowledge. This is also consistent with the view advocated 
by [9, 29], that a shared environment is required for common knowledge to 
arise. 

Even though the fixed point definition implies the infinite conjunction, 
Fischer and Immerman [18] showed that in finite-state systems, where the 
set of all global states in a system R is finite, there is a power m such 
that CgV is equivalent to EQip. The fip protocol, with its perfect recall 
property in the synchronous context produces a state space whose size 

is unbounded. Nevertheless, given the role of the centipede and centibroom 
structures in 7"^^'^, we now show that there are cases in which common 
knowledge is a finite conjunction under fip in 7""^'^ as well. 

Roughly speaking, when running fip it takes time to obtain deep knowl- 
edge without having common knowledge. Indeed, we obtain a sharp bound 
on the depth of Eq that can be obtained d time units after the occurrence of 
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a nondeterministic event. Given a group of size |G| = 5 and natural number 
d > 0, we denote by M^g = {d - l){g - I) + 2. We prove 

Theorem 13 Let e he an ND event occurring at {io,t) inr £ TZ^'p, letd> 0, 
and\G\=g. If (7^f'P r, t + d) ^ occurred (e) then (7^f'P r, t + d) N 
CG(occurred(e) A ND(e)). 

Note that although a centipede's "body" nodes {9q,. . . , 9^) are naturally 
conceived of as distinct, they need not be such. Yet recall that by Lemma 4, 
when two body nodes are distinct, their time components must also be 
distinct. 

Theorem 13 follows directly by Theorem 11 from the following lemma: 

Lemma 21 Let r G TV^"-^ , d > 0, G C P with g = \G\, and assume that e 
is an ND event at {io,t) in r. If {IZ^"-^ ,r,t + d) ^ i?^%ccurred(e) then 
there exists a centibroom node for (zq, G) in {r,t . . .t + d). 

Proof Assume that (7^"'"^,r,^ + d) \= £;^%ccurred(e). If {io,t) is a cen- 
tibroom for {io,G) in {r,t . . .t + d) then we are done. Otherwise |G| > 1, 

and moreover there is some j G G such that (io,^) + d). For nota- 

tional convenience, let us denote the processes of G by {jo, . . . , Jg-i}, where 
{io,t) /-^ {jo,t + d). Denote M = M^g - 1 and let f{h) = i(/imodg) for all 
h < M. Thus, / maps natural numbers into members of G, every interval 
of g adjacent numbers arc mapped to the full set {jo, . . . , = G, and 

/(O) = jo- We focus on a knowledge formula of the form 

*(e) = Kf^^M) Kf(M-i) ■ ■ ■ Kni)Kf{o) occurred (e) . 

Observe that there are M + 1 = (|G| — l)-((i— l) + 2 knowledge op- 
erators in *(e), all of which belong to processes in G. By assumption, 
^j^max ^ r, t+d) 1= £'^"'""^occurred(e), and hence in particular (7^™"^, r, t+d) 1= 
^'(e). The Knowledge Gain Theorem implies that there exists a centipede 
for (io,/(0),/(l),..,/(M)) in {r,t...t + d). Let 

{{iQ,t),n\^\...,^^-\{f{M),t + d)) 

be such a centipede. By definition of a centipede we have that (io)*) ^ 
and — ■> (jO)^ + d). Since is transitive, the fact that (io^i) -/—* 

{jo,t + d) implies that {io,t) /--> $7^. Since is reflexive we have that 
(io,0 / 0°. Recall by definition of / that /(M) = /(M - 1) + Imodi/. 
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Since g > 1, clearly /(M) 7^ f{M — 1). Hence, by Lemma 4 we have that 
{f{M),t + d) /-^ {f{M -l),t + d). It follows that ^^'^ 7^ {f{M),t + d). 

By Lemma 4, if Q'' = then $7^ = 17^ = $7'' for every h" in the range 
h < h" < h'. Let $1, . . . , $d denote the maximal sub-sequence of distinct 
nodes in the sequence Q^,. . . Lemma 4 implies that the times at 

which the nodes (zq, t), $1, . . . , {f{M),t + d) occur form a strictly in- 
creasing sequence, and so D < d — 1. For all b in the range 1 < b < D define 
s{b) = {k : = Since M = {\G\ - 1) • (d - 1) + 1 and D < d - 1, 

we have by the pigeonhole principle that |s(6)| > |G| for at least one such 
index b' . Since the set s{b') consists of at least |G| = g consecutive natural 
numbers, we have that {f{k) : k G s(b')} = {jo, . . . ,jg-i} = G. By defini- 
tion of the centipede it follows that fli,' --^ {j,t + d) for all j G s{b') = G, 
and so Ub' is a centibroom node for {io ,G) in {r,t . . .t + d), as required. ■ 

As the next lemma shows, the bound of Mdg = (ci — 1) (5 — 1) + 2 of 
Lemma 21 is tight. 

Lemma 22 For every t > 0, d > and g > 1 there exists a run r G T^^'P 
an ND event e at (io,t) in r and a set of processes G C.F of size \G\ = g, 
such that 

{n^''Pr,t + d)^ £;2^''''"^occurred(e) A ^CGOCcurred(e). 

Proof Fix d, g. Define 7^^^^ to be a synchronous context with the following 
properties: 

• Let G = {jo, ..,jg-i}. For every m < g, denote by G-m the set 

G\{jm}. 

• Let P = GU{^o}U{^fe,m}i<ifc<d,o<m<fl- The set of processes is seen in 
Figure 4.4a. 

• The network graph is complete, and the bounds on transmission times 
are as follows 

1. for every k < d and m < g, b{hk,rmj) = 1 for all j € 

2. for every other i,j G P, b{i,j) = d+1 

For every 1 < k < d, use to denote the set {/ifc,m}o<m<c/- Note that 
for every r G T^^"''d,g = ^(f'P; Td'g'^); ^ the processes are running the fip, 

every process sends every other process a message at every time unit. Note 
also that there can be no centibroom node for (io, G) in (r, . . . d), because 
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for every process i there exists at least one j & G such that b{i,j) > d. 
Hence, by Theorem 10, r,t + d)\= -iCGOCCurred(e). 

Choose r G TZ^^'^d,g such that an ND event e occurs at {iQ^t) and such 
that all sent messages arrive at the maximally allowed transmission time, 
except for the following ones: 

1. For every h & Hi, the message sent from zq to h at time arrives at 
time 1. 

2. For every 1 < A; < d — 1, for every pair of processes hi G i/j, and 
h2 G -fffc+i, the message sent by /ii to /i2 at time k arrives at A; + 1. 

3. For every h G H^-i and j G G, the message sent from h to j at time 
d — 1 arrives at time d 

The existence of r is guaranteed by definition of Tl^"''d,g' the run is a legal 
possible execution of fip in the defined context. 

Use f{k) to denote the value (g — 1) ■ k for every k > 0. Fix a sequence 
S = {io,ii, ■■,if[d-i)+i) such that {ii, .., C G. Observe that for 

every 1 < k < d, the subsequence Sk = contains exactly 

g — 1 elements, and so there must exist some j{k) G G such that j{k) ^ i 
for every i G s^- 

We now define a node sequence ((ioj 6*1; ••) ^/(d-i); (V(<i-i)+i' ^)) 
show that it is a centipede for 5 in {r,0 . . .d). For every Z = — 1), let 

k = , and define 6*; = (/ifcj(A;)) k). Observe that f{k—l) < I < f{k), and 
hence by choice of j{k) that b{hi^j(^i^'j,ii) = 1. Since k < d—lwc obtain that 
9l --^ {ii,d). Moreover, if / < /(d - 1) then di Oi^i. For if k > ^ then 

9i = 9i-^i and the result stems from the refiexivity of while if fc = 
then, noting that 0i G and 9i+i G -fffc+i, we get the result from clause 
(2) above. Finally, we note that {io,t) ^ 6i = (/iij(i), 1) since /iij(i) G Hi 
and using clause (1), and similarly that ^/(^-i) = d — 1) ^ 

d) since G and from clause (3) above. Fig- 

ure 4.4b shows a fragment of the described centipede. 

We have shown that there exists a centipede in (r, . . . d) for every 
sequence (io, n, ••, V(<i-i)+i) such that {h, ■■,if{d-i)+i] ^ G. By Theo- 
rem 7 we get that (7^''%,r,^') N ■•• ii'j^ occurred (e) for 
every such sequence. We thus obtain, considering that /(d — 1) + 1 = 
{g - l){d- + l = Mdg- 1, that (7^f'%,r,^') ^ ^^''«~^occurred(e), by 
definition of E operator. ■ 
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Figure 4.4: The setup for Lemma 22 



Theorem 13 and Lemma 22 tightly bound the levels of Eq that can 
hold without common knowledge necessarily arising. They draw an essential 
connection between this bound, the size of the set of processes G in question, 
and the time that elapses since the ND event of interest occurs. It is natural 
to ask whether this property is restricted to fip, or perhaps may be true in 
general. We now show that it is not true for all protocols. In fact, there 
is a protocol that can attain arbitrary levels of nested knowledge quickly, 
without giving rise to common knowledge. 

Example 6 Let j"^^^' be a context with P = {s,0, 1}, where the network 
is V-shaped with s at the base, and the communication bounds are 6^,0 = 
bg^i = 1. The initial state of process s contains an initial value consisting of 
a natural number k > 0. We assume that the protocol P that s is following 
prescribes the following actions upon receiving an external input ( an event 
that we denote by e): If k is odd, then s sends the message (occurred(e), fc) 
to process 1, and the message (occurred(e), — 1) to process 0. If k is even, 
then s sends the message (occurred(e). A;) to process and, in case k > it 
also sends the message (occurred(e), /c — 1) to process 1. Moreover, s never 
sends a message of the form (occurred(e), d) if e does not occur. 

Thus, if k = then only one process will receive a message, and in all 
other cases both of them will. Whenever an process receives the message 
(occurred(e), /i), it knows that e occurred but does not know whether k = h 
or h + 1. In particular, upon receiving (occurred(e), 0), process considers 
it possible that 1 received nothing and does not know that e occurred. 

Wc now show that arbitrarily deeply nested knowledge can be obtained 
in this setting within a single time step, without common knowledge arising: 

Lemma 23 In the context of Example 6, let r e R = TZ{P,j'^^^'), let G = 
{0, 1} and assume that the event e, consisting of the receipt of an external 
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input by s at time t, in r. If the initial value of s in r is k then 

{R,r,t + 1) \= EQOCCurred{e) A -■CGOCCurred(e). 

Proof We split the proof into two parts, handled by Lemmas 24 and 25. 
Assume that e occurs in r at time t as stated, and that the initial value is k. 
By Lemma 24 we have that {R, r, t + 1) t= £'^occurred(e) and by Lemma 24 
that {R, r, t+1) t= -i£^Q+^occurred(e). Since 1= -^E^^ip -)• -^Cq^ is a validity, 
the latter implies that {R,r,t + 1) N -iCGOCCurred(e), and the claim holds. 



Lemma 24 The conditions of Lemma 23 imply that {R, r, t+l) N £JgOccurred(e). 

Proof Observe that by the structure of the protocol, (occurred(e), d) mes- 
sages are sent only if e indeed takes place. Thus, for both processes i € {0, 1} 
it is the case that if i receives a message of the form (occurred(e), d) at 
time t + l in r with any value d>0, then (R, r, t + 1) 1= Kjoccurred(e). 

By convention, we define E^ip = (p. We prove by induction on A; > that 
if the initial value of s in r is /i > and e occurs at (r, t), then {R, r, i + 1) N 
£^^occurred(e). In particular, this implies that {R,r,t + 1) t= £^QOCCurred(e) 
in the case h = k, establishing the claim. We consider two case. 

A; = : By assumption, e occurs at time t in r, and thus {R, r,t + I) t= 
occurred(e), and by definition of Eq also {R,r,t+ 1) 1= £?QOCCurred(e). 

k > : In this case, process i = parity(/i) receives the message (occurred(e), h), 
and the other process j = 1 — z receives (occurred(e), /t— 1). According 

to the protocol, a message (occurred (e), d) is received if the initial value 
is either d oi d + 1, and hence at least as large as d. Both processes 
thus know that the initial value is at least as large as ^ — 1. Since 
h > k by assumption, and by the inductive hypothesis we have that 

{R,r,t + 1) t= £^^~^occurred(e) whenever h > k—1, it follows that both 
{R,r,t+l) h Ki£;^"^occurred(e) and {R,r,t+1) N X'j£;^-^occurred(e). 
Hence, {R,r,t + 1) 1= -E^occurred(e) and wc arc done. 



Lemma 25 The conditions of Lemma 23 imply {R, r, t+l) h ^Eq~ occurred(e). 

Proof First notice that, in every run r' G R, a process that does not 
receive a message of the form (occurred(e), d) does not know that e occurred. 
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since there is another run r" G -R in which its local history is identical to 
then one in r', and where the event does not occur. We can now prove 
the claim by induction of A;. If A; = then process 1 docs not receive a 
(occurred(e), d) by time t + 1. Thus, {R,r,t + 1) 1= -i-fCiOCCurred(e) and so 
{R,r,t + 1) 1= -i£?g.occurred(e), as claimed. 

Let A: > and assume inductively that the claim holds for A; — 1. 
By definition of the protocol, process i = parity(A;) receives the message 
(occurred(e). A:), and the other process j = 1 — i receives (occurred(e). A; — 1). 
There is a run r' G -R in which r^(t + 1) = rj{t + 1) and the initial 
value is A; — 1. It follows that {R,r,t + 1) N ^KjEQOCcurred{e), and thus 
{R, r,t + l)\= ^Ejg~^occuvred{e), and we are done. ■ 

We note that the epistemic structure obtained here is similar to that 
which arises in the electronic mail game of Rubinstein [47], and in the co- 
ordinated attack problem [23]. One distinguishing feature is that in our 
example here the high degree of nested knowledge is obtained in one step, 
with two messages, whereas a long interactive exchange of k messages is re- 
quired to achieve k levels of nesting in the other cases. A similar epistemic 
structure also arises in the analysis of the initial states of the muddy children 
puzzle [15], or of the Conway paradox [11]. 

4.7 Conclusions 

Taking a step beyond nested knowledge, this chapter develops the theory 
needed in order to characterize common knowledge gain, an epistemic state 
that is only possible in synchronous systems [23] . We define the centibroom, a 
simpler, tighter, communication structure than the centipede, and prove the 
Common Knowledge Gain Theorem that validates the centibroom's causal 
nature. We then show that the centibroom is also necessary in solution to 
the Simultaneous Response problem. 

Based on the fip, first introduced in Chapter 3, it is shown that cen- 
tibrooms are also sufficient for common knowledge gain. We then utilize 
this result to determine sharp thresholds regarding when nested knowledge 
becomes common knowledge under fip. Finally, Example 6 shows that this 
phenomenon is not universal to all protocols. A protocol exists in which no 
depth of nested knowledge must imply common knowledge. 
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Chapter 5 

Gaining Nested Common 
Knowledge 

5.1 Introduction 

The Ordered Response problem deals with a totally ordered sequence of 
response, and the Simultaneous Response problem with groups of responses 
that must be enacted in unison. As seen in Section 2.1, we can look at 
the required time ordering in an instance SR = (et, ai, . . . , ak) as the set of 
requirements {Time{et) < Time{ag) < Time{ah)\g, h G 1, ..,k}. 

Two possible extensions of problem specifications come to mind. The 
first extension, that wc call the Ordered Group Response problem, is an 
immediate generalization of the OR and SR problems. 

Definition 18 (Ordered Group Response) Let e-t be an external input 
and let = {a\, ..,a^^) he a set of responses of length £h, for every 
h = l,..,k. A protocol P solves the instance OGR = {ct, A^ , . . . , A^) of 
the Ordered Group Response problem if it guarantees that 

1. in a triggered run, for every h = l,...,k all of the actions in the 
response set A^ are performed simultaneously; moreover, if h < k 
then the actions in A^ will happen before (i.e., no later than) those of 
^'*+^ Finally, 

2. none of the responses, in any of the sets A^,. . . ,A^, occurs in runs 
that are not triggered. 

We will use to denote the set of processes {i G P|(i, a) G A'*} for every 
h <k. 
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It is easy to see that every instance of OR can be rewritten as an instance 
of OGR where all sets of responses are singletons. Similarly, every instance of 
SR can be rewritten as an instance of OGR = {et,A^) where all simultaneous 
responses are members of A^. 

The second extension, which is even wider scoped than OGR, is a problem 
specification where the required event ordering is given by any arbitrary 
partial order. The Generalized Ordering problem will be defined and studied 
in the next chapter. In this chapter we will focus on the OGR problem. 
We defer giving a leading example until the next chapter, which will make 
use of this chapter's results. Apart from providing the foundational results 
necessary for the next chapter, this chapter can also be seen as providing a 
unifying account that merges the thus-far separately treated theories that 
surround OR and SR. 

5.2 Relating Ordered Group Response and Nested 
Common Knowledge 

In order to relate the new ordering problem to a causal structure, we will 
first identify an epistemic condition that is implied by protocols solving the 
problem. As solutions to OR require nested knowledge and those of SR 
imply common knowledge, we expect that solutions to OGR will necessitate 
a little of both kinds of epistemic states. 

We will say that nested common knowledge of (p obtains at run r & TZ 
and time t with respect to groups of processes Gi,..,Gk if 

holds. As we will show, nested common knowledge is a necessary require- 
ment in protocols that solve the OGR problem. In fact, we show that nested 
common knowledge is necessitated even if the protocol only weakly solves 
OGR, according to the following definition. 

Definition 19 (Weakly solving Ordered Group Response) 

Let OGR = {et, A^, . . . , A^) be an instance of the ordered group response 
problem. A protocol P weakly solves OGR if it guarantees that for every 
h = 1, . . . , k and a G A^ 

1. for every r G TZ^"'^, time t and a' G A^, a occurs at (r, t) iff a' occurs 
at {r,t) 
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2. for every r G 7?."*"^, time t and a' G A^' where h' <h, if a occurs at 
{r,t) then a' occurs at {r,t') where t' < t. 

Note that every protocol that solves OGR also weakly solves it, but that 
the opposite implication does not hold. A protocol that weakly solves OGR 
does not necessitate that any of the responses occur in a triggered run. 

Recall that in order to prove the relation between solutions to OR and 
nested knowledge in Theorem 1, we had to assume that processes can re- 
call responses that they had performed. We now define a stronger recall 
requirement for nested common knowledge. 

Definition 20 (Group response recall) Let OGR = {ct, , . . . , A'^) and 
assume that TZ = TZ{P, 7) is a system of runs for a protocol P where all of 
the responses may occur, and 7 is any arbitrary context . Protocol P recalls 
group responses for OGR if for all a ^ A^ where 1 < h < k, r e TZ, t' < t 
and i ef, if {TZ, r, t) 1= iCiOCCurred(a) then {TZ, r, t') 1= iCiOCCurred(a). 

A protocol recalls group responses if processes, once they know that a 
response has taken place, never forget this fact. We are now ready to prove 
that OGR requires nested common knowledge. 

Theorem 14 Let OGR = {cz, A^ , . . . , A^) , and assume that protocol P 
weakly solves OGR in 7 and that it recalls group responses for OGR. Let 

r ^ TZ be a run in which Ct occurs at time tQ, and where the processes in 
perform the actions a\ to a^^ simultaneously at th > th-i, for every 
l<h<k. 

Then {TZ,r,th) 1= Cj/jCj^-i • • • C/i(occurred(et) A ND(et)) for every 1 < 
h<k. 

Proof We proceed by induction on k. 

k = : {TZ, r, to) \= (occurred(et) A ND(et)) by definition of r. 

k > : We will use the Induction Rule for common knowledge to 
prove the inductive step. Recall that A^ = {af, ..,agj. Fix h,gG 
{!,.., ^fe}. We first show that TZ 1= occurs(a^) — >■ £'G(occurs(a^) A 
Cjk-i ■ ■ ■ C/i (occurred (ct) A ND(et))). Note that as all responses in A'' 
are performed simultaneously, we get TZ \= occurs(a|) -H- occurs(a^). 
Since whether the (deterministic) protocol P performs the action Og 
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is a function of Zg's local state, we have that Tl N occurs(a^) 
KigOCCurs(a^). Now using the former equivalence we get that 

(*) TZ N occurs(a^) Kig0ccurs{a\) . 

Now choose arbitrary r' , t'. Suppose that {TZ, r', t') 1= occurs(a^). Note 
that since P weakly solves OGR, we have both 

(i) when af^ is performed, the responses in A^~^ have already been 
performed (or are being performed). Say that these have been 
performed at a time < t' . And, 

(ii) protocol P also weakly solves the sub-problem 

OGR' = {et,A^,..., A'^~^). By the inductive hypothesis, we have 

{n,r',t'f^_i) N Cjh-i •••Cji(occurred(et) AND(et)) 
for all h < k. 

As processes recall group responses and < t' , we obtain from {ii) 
that {Tl,r',tf) \= Cjk-i ■ ■ ■ C/i(occurred(et) A ND(et)). Given this and 
the fact that if {TZ, r', t') )A occurs(a^) then (7?., r', t') 1= occurs(a^) a. 
for any a, we conclude that 

It 1= occurs(a^) Cjk-i ■ ■ ■ C71 (occurred (ct) A ND(et)). 

Combined with (*), we obtain 

TZ t= occurs(a^) Ki^Cjk-i ■ ■ ■ C71 (occurred(et) A ND(et)). 

Since g was arbitrarily chosen, we get TZ 1= occurs(a^) — >■ £^jfc0ccurs(a^), 
and also TZ t= occurs(a^) EjkCjk-i ■ ■ • C71 (occurred (ct) A ND(et)), 
ending up with 

TZ t= occurs(a^) £'jfc(occurs(a^)ACjfc-i • • • C71 (occurred (et)AND(et)), 
as required. 

Let $ = occurs(oj') and ^ = C^k-i ■ ■ ■ C71 (occurred (et) A ND(et)). Ap- 
plying the Knowledge Induction Rule we get 7^ N $ — >■ Cq^ from 
7?. ^ $ ^ Eg{^ a ^'), giving us that 

TZ N occurs(a^) CjkCjk-i ■ ■ ■ C71 (occurred (ct) A ND(et)). 

Recalling that {TZ,r,t) \= occurs(a^) by assumption, we obtain that 
7^N CjkCjk-i •••Cji(occurred(et) AND(et)). 
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^Theorem 14 



An immediate corollary is that nested common knowledge is also neces- 
sitated in protocols that solve (not weakly solve) OGR. 

Corollary 1 Lei OGR = {et, , . . . , A'^) , and assume that protocol P solves 
OGR in 7 and that it recalls group responses for OGR. Let r £ TZ be a run 
in which Ct occurs at time to, and where the processes in perform the 
actions to a^^ simultaneously at t^ > th~i, for every 1 < h < k. 
Then for every 1 < h < k 

(7^, r, th) \= CjhCjh-i • • • C/i (occurred(et) A ND(et)). 

As mentioned above, both nested knowledge and common knowledge are 
specific cases of nested common knowledge. As such, Theorems 1 and 2 can 
be derived as further corollaries from the above one. 

To complete the picture, we briefly point out that there are protocols for 
which nested common knowledge gain implies a solution to OGR. 

Definition 21 (Group considerate protocol) Let OGR = {et,A^, . . . , A^), 
where A^ = {a^, •-, ) for every h = 1, ..,k. Protocol P is group consid- 
erate with respect to OGR if for each h < k and m < £h, response is 
carried out by its respective process i!^ as soon as Cj/j(occurred(et) AND(et)) 
is established, but no sooner. 



Lemma 26 Let OGR = {et,A^, . . . ,A'^), and let P be a group considerate 
protocol with respect to OGR. If for every r G T^™^^; _ ^^(^p^r^^ such that Ct 
occurs at (io, t) in r there exists time t' such that 

(7^"'"^,r,^') N CjfcCjfc-i •••Cji(occurred(e) AND(e)), 

then P solves OGR. 

The proof repeats the one of Lemma 1, with nested common knowledge 
replacing nested knowledge. That a process will know immediately that com- 
mon knowledge has been achieved is given by the validity Cq^ -H- KgCa^p 
for any g & G. 
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5.3 Generalized Centipedes 



We expect that just as OGR generalizes both OR and SR, a characterizing 
causal structure will generalize both the centipede and the centibroom. The 
generalized centipede, defined below, offers just this kind of generalization. 

Definition 22 (Generalized Centipede) Let r G TV^^, let Q F for 
1 < h < k and let and t < ti < ■ ■ ■ < t^. A generalized centipede for 
{00, I^,. . . , I^) in (r, t..t') is a sequence of nodes 9q ^ 9i ^ ■ ■ ■ ^ Ok such 
that Oq = (io, t), and 6^ — (i^, t') holds for all h = 1, . . . , k and G 




In chapters 3 and 4 we found it convenient to consider special kinds of 
centipedes and centibrooms, namely centinodes and bridging centibrooms, 
respectively. Once again, the following definition extends both of these 
special kinds. 

Definition 23 (Bridging Generalized Centipede) A generalized centipede 
{00, . . . , 9k) for {{io, t), I^, . . . , I^) in (r, t..t') is bridging if 0^ is a bridging 
centibroom for {{io,t), I^) in {r,t..t') for all h = 1 . . . k. 

As the following lemma shows, generalized centipedes and their bridged 
sub-kind may be freely interchanged. 

Lemma 27 A generalized centipede for {0q, , . . . , I^) exists in {r,t..t') 
iff a bridging generalized centipede for {Oq, , . . . , I^) exists in (r, t..t'). 
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Proof That the existence of a bridging generahzed centipede impHes that 
of a generahzed centipede is immediate. We now prove the other direction. 
Assume that C = {6o,..,6k) is a generahzed centipede for (/",..., l'^) in 
{r,t..t'). We define by induction on h < k generahzed centipedes Ch = 
{9q, ..,6'f^,6h+i, ■■,dk) in {r,t..t'), in which the nodes 6q to are bridging 
centibrooms for (/°, . . . respectively. 

h = : By definition, Oq = (ig, t). As 1° = {io}, Oq is a trivial bridging cen- 
tibroom for 1° in (r,t..t'), and hence a bridging generahzed centipede 
for (/O). 

h > : Assume that a bridging generalized centipede 

Ch-i = {6q, ..,9'f^_i,9h, ■■■,0k) as described above has been constructed. 
By Lemma 7 there exists a node 9'^ bridging and 9h^ Since 

Gh {i^^t') for all i'* G we get that is a bridging centib- 
room for (/",..., l'^) in (r,t..t'). Define C;, = (^?^, .., .., ^fc). If 

h = k then wc are done. Otherwise, since 9h' — ■> 9h ^ 9h+i and 
C/i_i = {6q, .., 9'i^_i,9fi, ■■,dk) is a generalized centipede for {iq, ■ ■ ■ ,ik) 
in {r,t..t'), we obtain that is also such a generalized centipede, as 
required. 



We now formulate the Generalized Centipede Theorem, which will be 
proved in the next section. The theorem is proved for protocols weakly 
solving the OGR problem. An immediate corollary gives us that the same 
conditions hold for protocols that (non- weakly) solve the OGR. This later 
corollary can be used to derive Theorems 4 and 9 as immediate further 
corollaries. 

Theorem 15 Let OGR = {et, , . . . ,A''), and assume that protocol P 
weakly solves OGR in 7. LeA, r G 'R,{P,^"^^^) he a run in which Ct occurs 
at time t = t^, and where the processes in perform the actions all to a^^ 
simultaneously at th > th-i, for every 1 < h < k, with t' = tk^ 
Then there is a generalized centipede for (/^, . . . ,1^) in {r,t^^t'). 

5.3.1 Nested Common Knowledge Gain Requires General- 
ized Centipedes 

We start by revisiting the relation between past, the past causal cone, and 
knowledge. We repeat here the definition of past and fut cones, as we now 
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wish to make use of the complete data that is encoded in the definitions. 
Definition 11 (reprinted) We define the future causal cone of a node a 
(in run r) to be 

f f \ — i /f) ATn \ ■ CK'^^in'' a^iid NDg \ 
\ is the set of ND events and initial states in ^ in r J 

Similarly, the past causal cone of a is 

9 a in r and NDq ^ 



past(r, a) {9, NDq) . events and initial states in in r 



Although past and fut are sets that contain pairs of node and event- 
set, we will frequently treat them simply as sets of nodes, when the second 
component of the pair is irrelevant in the context. 

Lemma 6 showed that the local state of a process, and hence also its 
knowledge state, is determined by its past causal cone. A straight forward 
extension shows that common knowledge of a group of processes is deter- 
mined by the union of their past cones. 

Definition 24 (Group past cones) For every G C P and time t, we will 
write 

1. lJPast(r, G, t) to denote the set IJ^gg past(r, (g,t)), and 

2. Pi Past(r, G, i) to denote the set Pj^gg past(r, (y, i)). 



Lemma 28 Fix r, r' G 7^"'"=^, G C P and time t, such that \J Past(r, G, t) = 
UPast(r',G,t). 

For every if e C, if (7^'""^, r, t) N Ca^p then (7^""^, r', t) N Cg^. 



Proof Suppose that {TZ™'"'^ , r' , t) Cq'-P- Then there exists some se- 
quence {gi,g2,..,gk) such that {TZ"''"'^ ,r' ,t) Kg^Kg^ ■ ■ ■ Kg^(p. Write tp' = 
Kg^--- Kg^if. We get (7^'""^, r', t) >^ Kg^ip'. Prom U Past(r, G, t) = (J Past(r', G, t) 
we obtain, in particular, that past(r, {gi,t)) = past(r', {gi,t)). By Lemma 6 
we get that (7^"'°^, r, t) Kg-^ip', which gives us (7^'""^, r, t) Kg^Kg^ ■ ■ ■ Kg^ip, 
contradicting the assumption that {TZ'^°'^,r,t) N Cof- ■ 
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The next two lemmas give us an even greater focus on the effect of 
the causal past upon the current epistemic state. The first lemma points 
out that the causal past itself is fully determined by those nondeterministic 
events that occur in it. We add the following definition. 

Definition 25 (Nondeterministic past) For every r G TZ'^"'^,noded and 
GCF, 

1. Define NDpast(r,6') = {{ip,ND^) e past(r,6') : ND^ 7^ 0}. This is the 
set of nodes ip such that ip G past(r, 9) and either an ND event occurs 
at ip in r or is an initial state. 

2. Define (J NDPast(r, G, t) = Upgo NDpast(r, {g,t)). 

3. Define fl NDPast(r, G, t) = NDpast(r, {g,t)). 



Lemma 29 Fix r,r' G 'j^rnax node {i,t). 

// NDpast(r, = NDpast(r', (i, t)) then past(r, (i, t)) = past(r', (i, i)). 
Proof We prove the claim by induction on t. 

t = past(r, (i, 0)) = {{isi, ND^Si)}, the singleton initial local state of i in 
r. By assumption it is the same state as in r', and hence past(r, {i, 0)) = 
past(r',(z,0)). 

t > Suppose that ri{t) 7^ r^{t). Then, wlog, by Lemma 6 there exists 
some {9,NDe) G past(r, (i, t)) such that {B^NDe) ^ past(r', (i, t)). If 
9 {i,f}j in r' then it must be that ND0 is different in r and r', 
in which case we get that NDpast(r, (^i)) 7^ NDpast(r', contra 
the lemma's assumptions. Hence there must exist some 9 such that 
9 {i, t) in r but 9 -/> {i, t) in r'. 

By Lemma 7 there exists a bridge ip such that 9 ^ tp {i,t). As 
^ past(r', {i,t)), it must be that 6 {i,t), and hence that ip 9. 

We now consider two cases: 

Ip = In this CclSG, clS NDpast(r, {i,t)) = NDpast(r', (z, t)), 9-^11) 

and an ND event occurs at 9 in r, it must be that 9 G past(r', (i, i)), 
contradicting the assumption that 9 ^ past(r', {i,t)). 
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ip 7^ {i,t): In this case from Lemma 4, it must be that = {j,t') for 
some t' < t. From NDpast(r, (i, t)) = NDpast(r', (i, t)) and ip G 
past(r, we obtain that NDpast(r, {j,t')) = NDpast(r', (j, t')). 
By the inductive hypothesis past(r, (j, t') = past(r', (j, t')) and 
hence, as 9 e past(r, {j,t')) it must also be that 9 G past(r, {i,t)), 
again contra the assumption that 9 ^ past(r', {i,t)). 



Proving the next lemma is done by composing the two previous lemmas. 

Lemma 30 Fix r,r' G 7^"''^^, time t and G C P. // U NDPast(r, G, = 
UNDPast(r',G,t) , then i/ (7^"*'^^ , r, i) N Ccy? then (7^™'^^,r^^) N Ccip. 



Definition 26 (Centibroom past) Let r e 7^"^«^, G C P and fix time t. 
The centibroom past of G in r at time t is the set 



As before, in the case of past and fut, we will often treat BroomPast(r, t, G) 
as a set of nodes, rather than a set of pairs. Note that BroomPast(r, t, G) C 
n NDPast(r, G, t) C IJ NDPast(r, G, t). 

When G = we partition the nodes in IJ NDPast(r, G, t) based on 

the existence of bridging nodes. Recall that a node h bridges 9 and V' if 
9 b — ■> and there is no node b' such that b' ^ b and 9 b' — ■> 6. 
Note that there may exist more than one bridge node connecting 9 and ip, if 
there is more than one syncausal path between the nodes. In the following 
partition, for each node 9 &\J NDPast(r, G, t) we look at the nodes bridging 
9 and as well as those nodes bridging 9 to (j, t). Moreover, we focus 

on those bridging nodes that are earliest: {i,t) is an earliest node bridging 
9 and ip if it bridges the two nodes and if there is no alternate bridging node 
{j, t') for 9 and ip such that t' < t. 

Definition 27 (Partitioning |J NDPast(r, G, i)) Given a run r G VJ^'^^ , 
a time t and G = {i,j}, for each {9,NDq) G |J NDPast(r, G, i) we have 
9 ^ or 9 ^ {jit)- Let denote the time of the earliest bridge nodes 
between 9 and {i,t), with tf = co if there are no bridges between 9 and 
{i,t) (i.e. 9-/^ iht)). Similarly define tj. The set \JNDPast{r,G,t) can be 
partitioned into the following subsets: 



BroomPast(r,t,G) = <^ {9,NDg) G |J NDPast(r, G, i)| 



there exists a centibroom for 
{9,G) in (r,0..t) 



} 
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1. BroomPast(r,t, G) 

2. Brl = {{9,NDe) G U NDPast(r, G, t) - BroomPast(r, t, < t^^} 

3. BrJ = {{9, NDq) e U NDPast(r, G, t) - BroomPast(r, t, < if} 

/ Rr^ = 

{{9, NDq) € UNDPast(r,G,i) - Broom Past(r, i, = ^ < oo} 



Notice that Definition 27 first constructs the cell for all pairs {9, NDq) g 
U NDPast(r, {i,j},t) for which there exists a centibroom for [9, {«, j}) in r, 
and then takes all the remaining nodes in |J NDPast(r, {i, j}, t) and parti- 
tions them further into Br[, Br^ and Br^^^g. In particular, this means that 
in case of some 9 G Br^^^g, where ti = tj, the nodes bridging ^ to (i, t) will 
be distinct from those bridging 9 to {j, t) , or otherwise we would have that 
9 e BroomPast(r,t,G). 

Once again using tf to denote the time of the earliest bridge nodes be- 
tween 9 and {i, t), with if = oo if there are no bridges, we divide the cells of 
Definition 27 further, by "slicing" each cell according to the time associated 
with the earliest bridging nodes. 

Definition 28 (slicing the partition cells of (J NDPast(r, {z, 

Given a run r G TZ^"-^ , a time t and G = {i,j}, let BroomPast(r, t, G), Br[, Br^ 
and Br^^^g form the partition in Definition 27. 

We divide some of the partition cells further into time-slices in the fol- 
lowing way 

1. Br[ = U.<t Br^(i'), where Br[(i') = {{9,NDe) G Br[|if = t'], 

2. Bfj = Ut,<t Br^(i'), where Bfj{t') = {{9, NDq) e Br^|tJ = t'}, and 

3- BC^g = Ut'<t ^^lame{f)> '^^ere 

B^lameit') = {{e,NDe) G Br^„„g|if =tl = t'}. 

For every G C P, we will say that runs r and r' are G-reachable at time 
t if there exists a sequence (ro, . . . r^) such that r = ro, r' = r^, and for each 
h < k there exists ih & G such that rh ^{it^,t) fh+i- The next lemma shows 
that the nodes in cells Br[, BrJ and Br^^^g are not essential for determining 
G-reachability for groups of size 2. 
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Lemma 31 Fix r,r' G 7^"'"^, time t, G = {i,j} C P. There exists C > 

and a sequence of runs {ri,r2, •.,?'2c) such that r and r2c ore G-reachahle, 
BroomPast(r2C,i,G) = BroomPast(r, G), and Br^^^ = Br^^c ^ gr^^c^ = 



Proof Wc prove by induction that for each d = 0, that there exists a 
run r2d such that 

1. r and r2d are G-reachable, 

2. BroomPast(r2d,t, G) = BroomPast(r, t, G), 
3- Br[- = Ui<,<t_,Br^(/.), 

4. Br^^^'^ = Ui<,<t_,Br,^(/i),and 

5- Br^^^g = Ul</i<t-d Br^a^el^)- 



d = In this case r2d = tq = r, and all requirements trivially hold (for 
example, Br^" = Br[ = Ui<,<t Br^(/i)). 

d> Inductively assume the existence of a run r2d-2 satisfying all re- 
quirements. Note that all 6 in Br^^"*"^, in Br^^"^"^ and in Brl'^e occur 

no later than at t — d+1. Let r2d-i be a run identical to r2d~2-, except 
that for every 6 G Br['''-'(t - d + 1) U Br^^^^J (i - d + 1), 6* ^ (i, i) but 

e^{j,t). 

We now show that such a run exists. Iterating over every node 9 G 
Br[^'*~^ (t — d + 1) U BrsamI (t — d + 1), we examine two possible cases: 

6 G Brsame — d + 1) For every node 5| bridging 6* and (j, t), arbi- 
trarily choose enough early events on the path 9 bj that occur 
in r2d_2, and cancel their occurrence in r2d-i, so as to make sure 
that 9 in r2d-i- This is possible since 9 /-■ > b^, or we would 
have that G Br^ or 6* G Broom Past(r, t, G). 
Since by assumption 9 ^ BroomPast(r2rf_2, G) , changing oc- 
currences in fut{r2d-i,9) does not alter the set 
BroomPast(r2d_i, i, G). Moreover, as there exists a bridge bf ^ bj 
for all b^, making changes in the nodes of 

fut(r2d-i,6')npast(r2(i-i, {j,t)) does not affect NDpast(r2(i-2, {i,t)). 



88 



9 G Br[^'^-^(t -d + 1) U0 -/^ (j, t) then 9 does not require that we 
alter r2d-i with respect to the current r2d^2- Otherwise, 9 
{j,t), but for every node ibj,tj) bridging 9 and (j, t) in r2d-2i 
there exists some (6^, t — d+1) bridging 9 and (i, t) in r2d_2 such 
that t — d + 1 < tj. It could be that {bi, t — d + 1) {bj,tj) for 
some such bridges, or it could be that there is no causal relation 
between the bridges. 

In either case, {bi,t — d + 1) -/-->■ {bj,tj) or we would have that 
9 G BroomPast(r, t, G). In the first case, we choose r2d-i such 
that enough early receive events are canceled along every path 
from {bi,t — d+1) to (bj,tj), so that 9 -/^ {bj,tj) for every such 
bridge. In the second case we choose r2d-i such that early receive 
events can be cancelled anywhere along the path from 9 to {bj,tj), 
once again resulting in 6 -/^ {bj,tj). Having gone over all nodes 
bridging 6 and (j, t) and removed bridges from r2d-i, we end up 
with e (j, t) in r2d-i- 

We get that Broom Past(r2d-i, i, G) = BroomPast{r2d-2-,t: G) , for 
the same reasons as in the previous case. Moreover, the early 
receives cancelled in r2d-i with respect to r2d-2 are either not 
in past(r2rf_i, ii,t)), or are in nodes tp such that ip {i,t). In 
either case then, canceling early receives does not alter the set 
NDpast(r2(i_2, {ht)) and we have 
NDpast(r2d-i, {i,t)) = NDpast(r2<i-2, {i,t)). 

In both cases examined we get that 

NDpast(r2d_i,(i,t)) = NDpast(r2d_2, (i, t)) 
and hence that r2d-ii{t) = r2d-2i{t)- We also get that 

BroomPast(r2(i_i,t, G) = BroomPast(r2(i-25i)G). 
Therefore, based on the inductive hypothesis we obtain that 

• {^2d-i}i{'t) = {'''}iit) - the local states of i at time t in runs r2d-i 
and r are identical, and 

• BroomPast(r2d_i,t, G) = BroomPast(r, t, G). 

Moreover, Br["'-'(t - d + 1) = 0, and again based on induction that 
Br-"*"' = Ui<h<t-d ^<{h). Finally we have for every V G Brslme{t - 
d+1) that {j, t) and that, as we only remove early receives, 

Br?a^^ {t-d+l)C Br7atl'e (t-d+l). 
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We now apply the same arguments in order to choose the run r2d, 
replacing j with i whenever possible. For every 9 G Brlame {t — d+1), 
we cancel enough early receives so as to make sure that 6 -/^ (i, t). As 
we also had that 9 -/^ (j, t) for every 9 G B/game {t — d + 1), we end up 
with Br^^d^g(t — d) = Summing up, we get that 

r2d-i r2d-2 and r2d r2d-i, 

Br?^ = Ui<.<*-.Brr(/i), 

= ^i<h<t-d Br; (/i), and finally that 



and the induction is complete. 

In particular, for C = i we get the lemma's required result. ■ 

We are now ready to prove the following theorem, showing that the state 
of common knowledge in a group of processes G is characterized in a precise 
sense by those nodes of their pasts in which an ND event occurs and which 
are centibroom-related to all of the processes in G. 

Theorem 16 Fix r,r' G time t and G C P. 

// Broom Past(r, t, G) = Broom Past(r', t, G) then 
{n^''\r,t)^CGV, iff {n"''''^,r',t)\=CG^. 



Proof Cases where |G| = are trivial, and those where |G| = 1 arc solved 
using Lemma 29. So assume that |G| > 2. Suppose that (7^™"^'^ r', t) \^ Cq^. 
Then there exists some formula tp' = Ki^Ki^ ■ ■ ■ Ki^ip and some G' = {i, j} C 
G such that (7^"*"^,r,^) N Cq'^' but (7^"*"^,r^^) Cc^''- 

From Lemma 31 we get that there exists a sequence of runs (ri, ..,rfe) 
such that 

(i) r n 1) r2 ^^i,t) {i,t) rk-i ~(j,t) rk, and 

(ii) BroomPast(rfe,t, G') = Broom Past(r, G'), and 

(iii) Br['= = Br^ = Br^^^, = 0. 

From (i) above and from {TZ"^"-^ ,r,t) N Cq'^ we get that (7^™"^,rfc,t) t= 
Cg'^P- 

Again applying Lemma 31 we obtain a sequence of runs {r[, .., r^) such 
that 
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(i') r' r[ f) r'2 r^(^i^t) ■ ■ ■ ~(i,t) r'^-i ~(i,t) r'^, and 
(ii') BroomPast(r;„t,G') = BroomPast(r', i, G'), and 
(iii') Br[- = Br^- = Br^Le = 0- 

From (i') above and from (7^"^"^, r', i) >^ Cg'<P we get that (7^"^"^, r'^,t)li^ 

Note that given (ii), (iii), (ii')) (iii') and from 

Broom Past(r, t, G') = BroomPast(r', t, G') 

we get that IJ NDPast(rfc, G', t) = U NDPast(r;„ G', t). By using the The- 
orem's assumptions regarding ND events and initial states and Lemma 30, 
we obtain that (T^'""^, r^, t) N Cqilp' implies {TZ"^°'^ ,r'^,t) N Cq'^', contra- 
dicting the above result (7^™"^, r^, t) >^ Cg'^'- ^Theorem 16 

Theorem 16 can be weakened into the following useful corollary. Con- 
sidering that BroomPast(r, t,G) Cf] NDPast(r, G, t), we get 

Corollary 2 Fix r,r' G 7^'""^, time t and G C P. // f] NDPast(r, G, t) = 
n NDPast(r', G, t) then if (7^"*'^^, r, t) N Gg'P then (7^'^'^^, r', t) N Gg^'- 

At long last we are ready to prove that nested common knowledge 
gain necessitates the existence of a generalized centibroom that relates the 
process- groups with the triggering node. The proof proceeds much in the 
same fashion as that of Theorem 6, but process groups have replaced indi- 
vidual processes. Thus, Theorem 17 generalizes both Theorems 6 and 10. 

Theorem 17 (Nested Common Knowledge Gain) Let P be a deter- 
ministic protocol, I'' QF for h = l...k, and let r G 7^"'"^ = 7^(P, 7"'^''). 
Assume that e is an ND event at (io,i) in r. If 

(7^""'^,r,^') h CjkCjk-1 ■ • •Gji(occurred(e) AND(e)), 

then there is a generalized centipede for (/^, . . . , /*^) in (r, t..t'). 

Proof We shall prove the contrapositive form: if no bridging generalized 
centipede for (/^, . . . , 7*^) exists in (r, t..t'), then 

^-jimax^ r, t') CjhCih-i ■ ■ ■ Gji (occurred (e) A ND(e)). 
We reason by induction on fc > 1: 
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k = 1 By assumption, there is no generalized centipede for (I^) in (r, t..t'). 

Hence, by definition of generalized centipede there is no centibroom 
for {{io,t), I^) in (r, t..t'). By Theorem 10 it follows that (7^'""^, r, t') 
C/i (occurred (e) AND(e)), as claimed. 

k > 2 Assume inductively that the claim holds for k—l. Moreover, assume 
that no bridging generalized centipede for (I^, ..,1^) exists in {r,t..t'). 
For every r' G 7^'""^ let 

(^0, ■ ■ ■ 1 V'fc-i) is a bridging generalized centipede 1 
for in (r',t..t') /' 

Observe that for every V'fc-i ^ there is no centibroom node ip 
for l'') in {r,t..t'). Otherwise, by Lemma 18, there would exist 

a bridging centibroom tp' for I^)^ and {ipQ, . . . , V-'fe-i? v') would 

be a bridging generalized centipede for (/^, . . . ,/''), contradicting our 
assumption. Thus, C" fl Broom Past(r', G) = 0. 

Choose r' G 7^"*"^ such that 

(i) the environment's actions at all nodes in past(r, 9) for every 6 G 
BroomPast(r, are identical to those in r; and 

(ii) all messages delivered to nodes not in past(r, 9) for any of the 
9 G Broom Past(r, t', G), are delivered at the maximal possible 
transmission time according to the bounds maXy . 

To see that such a run r' indeed exists in Jl™-^^^ we note that clauses (i) 
and (ii) relate to different sets of nodes, that it is impossible by defini- 
tion of BroomPast(r, t', G) that there exists some 9 ^ BroomPast(r, t' , G) 
such that 9 G past(r, ijj) for some -0 G BroomPast(r, t', G), and that by 
definition all early message receives can be delayed, independent of 
the run's past or concurrent events. Since T?,"*"^ contains all runs of P 
in it must include r' . 

Notice that by construction of r' we have that a a' holds in r' only 
if a a' in r, and that every early receive in r' is an early receive 
in r. Considering that bounds are universal in all runs, we obtain 
that every bridge node in r' is also a bridge node in r, and hence that 
C""' C C\ By definition of r', and since C n BroomPast(r, t', G) = 0, 
none of the nodes in the set G^, and hence also in G** , experiences an 
early receive in r' . Yet from Lemma 8 and from (io, t) ^ it follows 
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that every node 9' G C^' must be a nontrivial bridge node in r' , thus 
experiencing an early receive. We therefore conclude that C" =0. 

By on the inductive hypothesis we obtain from this that (TZ"^""^ ,r' , t') \f 
C/fc-i ••• C71 (occurred (e) A ND(e)), and using the Knowledge Axiom 
we get that (7^™''^, r', t') \f CjkCjk-i ■ ■ ■ C7i(occurred(e) A ND(e)). By 
applying Theorem 16 we get that 

(7^"*"^,r,^') \/ CjkCjk-1 ■ ■ ■ Cji (occurred (e) A ND(e)), 

and we are done. 

^Theorem 17 

Using Theorems 14 and 17, we can now prove Theorem 15. The proof 
repeats that of Theorem 4 almost to the letter so it will not be repeated 
here. 

5.4 Conclusions 

This chapter introduced the OGR problem, along with nested common knowl- 
edge and the generalized centipede. These provide a unifying theory for the 
concepts and results presented in Chapters 2, 3 and 4. 

But these concepts also provide important infrastructure for solutions 
to the Generalized Ordering problem, which will be discussed in the next 
chapter. In particular, the notion of weakly solving OGR will play a central 
part. 

An interesting result presented in this chapter is Theorem 16, that char- 
acterizes common knowledge among a group G, based on a subset of nodes 
in their shared pasts. Thus, neither complete information about the local 

states of these processes nor about their causal past are needed, in order to 
settle the scope of common knowledge among the group's members. 
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Chapter 6 

Generalized Ordering of 
Events 

6.1 Introduction 

Previous chapters have studied coordination under various restrictions: lin- 
ear ordering of responses, simultaneous responses, or a linear ordering of 
sets of simultaneous responses. In this chapter we remove all structural re- 
strictions on ordering and consider systems where the required ordering of 
the responses is given by any non-particularized partial order. 

As we will see, this ultimate generalization does not spawn yet more 
intricate causal structures and epistemic states. Rather, solutions require 
multiple instances of the (already defined) generalized centipedes to exist in 
triggered runs. 

We start with a concrete, if simple, example. Consider the following 
case, describing the production process for the Munchy Crunchy chocolate 
bar. 

Example 7 Charlie's Chocolate Factory produces all kinds of chocolate, 
based on distributed processes that control various machines and manufac- 
turing stages. The Munchy Crunchy is one of the chocolate bars manufac- 
tured in the plant. Those processes involved in its production are visualized 
in Figure 6.1. There are 10 different distributed processes involved in the 
manufacturing, arranged into 2 initiating singleton clusters and 3 multi- 
process clusters. Assume that the underlying network graph relating all of 
the processes is full, and that it contains many other processes besides those 
shown in the figure. 
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Figure 6.1: Production process for Munchy Crunchy chocolate bar 

The figure describes the required manufacturing process, with arrows sig- 
nifying activation order. Two processes, each a singleton cluster, initiate 
manufacturing by sending chocolate and crunchies into the system. Each of 
these processes is controlled by a human operator. Cluster 1 contains pro- 
cesses that control the input and output valves of a mixing bowel that mixes 
together chocolate and crunchies. These valves must all be opened simul- 
taneously (this is visualized by a cycle in the ordering graph), but only if 
both chocolate and crunchies are being streamed into the system. Cluster 
2 controls another mixing bowl, with only one input and one output valve. 
Here pure chocolate for the bar's coating is blended with unhealthy chemi- 
cals. Again, both valves must operate together, but only if chocolate is being 
streamed in. Finally, the processes in Cluster 3 control the coating, temper- 
ing and wrapping machines. These too must start to work simultaneously, 
but only if the mixing bowls are sending out their blends. 

The required manufacturing process described above goes beyond the 
problem formulations we have thus far seen. We are seeing not only require- 
ments for linear ordering and for simultaneity, but also multiple triggers, 
and events that are causally dependent upon multiple independent causes. 

We now define a class of problems for which the requirements graph 
shown in Figure 6.1 would be an instance. Note that the graph can also be 
expressed as a partial order (ycr«nc/.y^ ^cmnchy^ defined on a set of events, 
where 

ycrunchy _ ^ckocolate, cruuckies, choc— ini, crunch— ini,outi, choc— 
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in2,out2, coat, temper, wrap}, and 

the partial order ^crunc/ij/ jg (jg£j^g(j for every pair edge-connected pair 
of processes. 

In order to fully express the graph, we need to add to the partial order 
a distinction between those triggering events that are spontaneous external 
inputs and those that arc responses to such triggers. In Example 7, the set of 
triggers is ] Tc^^cfej/ _ ^cfiocolate,crunchies}. We formalize requirements 
such as the one given above, in the following way . 

Definition 29 (Generalized Response Problem) An instance of the gen- 
eralized response problem is defined by a tuple GR = {V,T,:<) where 

1. V is a set of events, 

2. T C.V is the set of ND external inputs in V, and 

3. :< is a partial order on V , such that every t ^ T is < minimal (i.e. 
for every t eT and e eV, if e :< t then e = t). 

A protocol P solves the instance GR = (V, T, :<) of the Generalized Re- 
sponse problem if it guarantees that in every run r, 

1. if e :< e' occur at t and t' respectively, then t <t' . Moreover, 

2. for every e' G V, e' occurs in r iff for every e ^ e' , e occurs in r . 

Consider a protocol pc^n-cft-i/^ that solves the instance of GR defined by 
the tuple (yc™ncfe2/^2.cr«ncftj/^^c™ncft2/^_ concern in this chapter is to 

characterize the necessary causal structures that must obtain in executions 
of protocols solving instances of GR, such as the protocol P'^^'^c/iy 

6.2 Condensed Representation of GR 

Each of the clusters in Example 7 contains a cycle, while the manufacturing 
requirements specify that the events in each cluster must occur simultane- 
ously. The next lemma shows the existence of a cycle does indeed guarantee 
simultaneity in protocols that solve GR. 
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Lemma 32 Let P he a protocol solving GR = (F, T, -<). Fix e,e' e V such 
that e ^ e' and e! < e. 

If e occurs at (r, t) then e' occurs at (r, t) too. 

Proof Prom e' :< e and definition of GR, there must be some t' < t such 
that e' occurs at {r,t'). Again from definition and from e :< e' , we also get 
that t<t'. Hence t' = t. ■ 

The GR formahzation can be used to designate any required order of 
events, but often, as in the case of Example 7, it is more sensible to consider 
the strongly connected components in the graph as single units. Thus for 
every instance {V,T, ^) of GR, we consider the condensed form {V',T', ^'), 
which is derived by collapsing every strongly connected component in (V, :<) 
into a single vertex, or component, C & V' . 

The partial order ^' holds between C and C iff there exist e e C and 
e' G C such that e ^ e'. Since for every r G T if e ^ r for some e £ V 
then e = T, the subset of triggering components is T' = Ureri^i — 
We will blur the distinction between {r} G T' and t & T freely and freely 
interchange between the two forms. 

Thus the condensed form {V',T', may be considered as an instance 
of GR in its own right. In fact, we will even speak of a component C G G' as 
"occurring" at time t, if all of its member events are simultaneously occur- 
ring at that time. Condensed forms of directed graphs enjoy the desirable 
property of containing no cycles. This property makes it easier for us to 
work with them than with the original instance of the problem, and follow- 
ing lemma shows that there is no harm done, as the two forms are equivalent 
with respect to protocol solutions. 

Lemma 33 Let GR = {V,T,^) and let {V',T',^') be the condensed form 
of GR. Protocol P solves GR iff it solves the condensed form. 

Proof 

=^ Assume that P solves GR. Fix run r. 

1. Fix C £ V and e, e' G C such that e occurs at (r, t). By definition 
of C we have that e' ^ e ^ e'. Hence it must be that e' occurs at 
{r,t) too. 

2. Fix C, C G V such that C ^' C and C and C occur at t and 
t' respectively. Then there exist e G C and e' G C" that occur 
at (r, t) and (r, t') respectively. By definition of we get that 
t < t'. 
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3. Fix C' G V' and {Ch}h=i..k^ the set of components such that 

Ch ^ C for all h. Let e' G C and Ch G Ch for all /i. By 
definition of ^' e/i < e' for all /i. Since P solves GR we get that 
e' occurs iff e/j occurs for all /i, and hence C occurs iff C^, occurs 
for all h. 



<^ The arguments pretty much repeat those in the other direction. 



6.3 Generalized Ordering Requires Multiple Gen- 
eralized Centipedes 

The ordering requirement formalized by a generalized ordering problem 
(y,T,<) expresses the idea that an event e G F should occur iff a set of pre- 
requisite actions had already been performed. These in turn will have their 
own set of prerequisites, etc. The next lemma reformulates the prerequisites 
for the occurrence of such an event in terms of chains of linear orderings. 
Focussing on condensed forms, we are assured that there are no cycles in the 
graph, and hence no infinite chains to reckon with. Of particular interest for 
us when considering solutions to GR are component chains, defined below. 

Definition 30 (Component chains) Let{V',T',:<') be the condensed form 
o/ GR. A component chain for C £ V' is a sequence (Co, Ci, .., Cfc) of al- 
ternating members of V such that Cq G T' , :< C^^i for all h < k, and 
Ck = C. 

Given GR = {V, T, :<), we say that {Cq, Ci, .., Ck) is a component chain 
for e G y if e G C for some C &V' where {V , T' , is the condensed form 
of GR, and (Co, Ci, .., Cfc) is a component chain for C. 

Lemma 34 Suppose that protocol P solves {V',T', ■<') , the condensed form 
of GR. Fix r G 7^"*«^(p^-)/"iax) and let C e V be a component that occurs 
at (r, t). Then for every component chain (Cq, Ci, .., C^) for C there exist 
to < ti < ■ ■ ■ < tk = t such that for every h < k, C^ occurs at t^ . 

Proof Suppose that there exists a component chain (Cq, Ci, .., Cfc) for C 
in which the condition does not hold. Then there exists some h < k where 
one of the following hold 
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Ch does not occur in r: In this case, as P solves {V',T',:<'), component 
C also does not occur in r, contrary to the assumption. 

there exists some h! > h such that th > th''. In this case, as Ch Ch' 
it must be that Ch' docs not occur at all in r, and hence also that Ch 
does not occur, and the case is reduced to the previous one, and thus 
to a contradiction. 



Lemma 34 reduces the necessary requirements for the occurrence of a 
component C into a set of linearly ordered requirements - one for each com- 
ponent chain leading back from C. Recalling the Ordered Group Response 
problem from chapter 5 and the notion of weakly solving, we note that each 
of the linearly ordered requirements just mentioned is in fact a requirement 
of the protocol that it weakly solve an instance of the OGR problem that is 
specified by the component chain. 

The following theorem formalizes this insight, giving us the necessary 
condition, in causal terms, for correct solutions to the GR problem. 

Theorem 18 Let GR = {V,T,^) and let P be a protocol that solves GR. 
Fix r G 7^™'"^(P, 7"^^'*), and suppose that event e &V occurs at {r,t). 
Then for each component chain {{t},Ci, ..,Ck) of e, there exists a gener- 
alized centipede {Oq, , . . . , l'') in (r, to--ifc) where and th are the set of 
processes where events of Ch occur and their time of occurrence respectively, 
and 60 is the node where t occurs. 

Proof By Lemma 33 P solves GR ifi' it solves the condensed form {V , T' , :<'). 
By Lemma 34 in every run where e occurs, for every initial component chain 
(Co = {r}, Ci, .., Cfc) for C, there exist to < ti < ■ ■ ■ < tk = t such that Ch 
occurs at th for all h < k. As the occurrences of Ch for all h < k are nec- 
essary whenever component C occurs, wc have by definition that P weakly 
solves the OGR instance defined by (r, Ci, .., C^). By Theorem 15, there 
must exist a generalized centipede for (I^, . . . , in (r, to..t). ^Theorem I8 

In a protocol that solves an instance (F, T, ^) of GR then, whenever 
an event e e V occurs, there exists a set of generalized centipedes - one 
for each component chain of e. Returning to Example 7 this means that 
in a protocol that properly controls the production process for Crunchy 
Munchies, whenever both chocolate and crunchies are being pushed into 



99 



the production system, the minimal communication between the distributed 
processes must contain all three communication structures see in Figure 6.2. 
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Figure 6.2: Generalized centipedes in the Crunchy Munchy production line 



6.4 Conclusions 

This chapter brings our investigation into the causal structures that underly 
coordination for purpose of event ordering to an end. Using the generahzed 
centipede and the notion of weakly solving, we have shown that for any 
prescribed partial ordering on events, the communication requirements in a 
protocol that solves the generalized ordering can be characterized as a set 
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of generalized centipedes. 

We use the condensed representation of the ordering graph, in order to 
avoid the loops that arc created whenever the ordering prescribes simultane- 
ous occurrences. Under this representation, vertices represent the strongly 
connected components of the original graph. 

As we proved, each maximal path in the condensed graph represents a 
requirement for the existence of a generalized centipede in every run where 
the path's ultimate component, or cluster of simultaneous events, occurs. 
The generalization of the ordering requirements to allow for multiple trig- 
gering events is translated into communication requirements that include 
sets of such centipedes. 
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Chapter 7 

Gaining Knowledge of 
Ignorance 

7.1 Introduction 

This chapter takes a different, complementing, loofc at the way transmission 
bounds affect knowledge and causality in distributed systems. In place of 
studying the effects of upper bounds in such systems, we will now focus upon 
lower bounds on transmission times, and how these affect knowledge gain. 
Intuitively, the existence of lower bounds makes it possible for one process 
to gain knowledge of another process's ignorance respecting a recent event. 

Such considerations seem to make more sense in an environment that is 
motivated by competition, rather than cooperation. In his book The Roth- 
schilds [33], Frederic Morton gives an account of the events in the London 
Stock exchange at the time of the Battle of Waterloo, in which knowledge 
about lower bounds on transmission times supposedly played a major role. 
Morton's (disputed) account can be summarized as follows: on the night 
of June 15, 1815. Nathan Rothschild, one of London's most prominent fi- 
nanciers at the time, was informed by his special private couriers that the 
Battle of Waterloo was won by the British. Official word by Wellington's 
men could only arrive on the next day. On the next morning, Rothschild 
went to the London Stock exchange, and signaled his agents to furiously sell 
consuls (government bonds). "iJe knows... [who won]" was the word among 
traders. The market crashed, and just before Wellington's men arrived with 
the news of victory, Rothschild signaled his agents to buy all available con- 
suls, at a fraction of their original price. He is said to have made a fortune 
on that day. 
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For the course of events described by Morton to be plausible, not only 
was it necessary for Rothschild to know about the outcome before everyone 
else. He also had to know that the others were ignorant of the outcome. Oth- 
erwise, he would fear that one of his rivals could out-smart him, gradually 
buy his shares and make out with a huge gain at Rothschild's expense. The 
epistemic circumstances in this example are based on Rothschild's courier 
system being known to have lower minimal transmission times than that of 
Wellington's communication lines. 

The Battle of Waterloo example above illustrates the importance of 
knowledge about other's ignorance in particular circumstances. For another 
example, consider a sealed-bid first-price auction for mining rights. Sup- 
pose that near the auction closing a potential bidder learns of a relevant 
event e, say that gold was found in an adjacent site. The bidder's valua- 
tion of the auctioned rights may have changed. But the decision regarding 
if, and by what amount, to alter her bid would depend on her knowledge 
about whether her competitor knows about e. In particular, if she knows 
that he is ignorant of e, then she should not increase her bid by a significant 
amount. The analysis presented will serve to show how our favored bidder 
can use her information about transmission times to figure out whether her 
competitor is ignorant of e. 

Our analysis will start by presenting a novel view of how bounds on 
message transmission times in a communication network induce causal cones 
of information flow among events in the system, in analogy with the light 
cones in Einstein-Minkowski spacetime considered in physics [16, 37]."*^ 

Based on this probing into causal cones, we will develop the formal theory 
of knowledge of ignorance. As mentioned above, such considerations are 
more naturally understood in the context of a competitive environment. For 
this reason this chapter will not introduce any kind of cooperative ordering 
task to motivate the analysis. 

7.2 Bounded Communication and Cones of Influ- 
ence 

Consider a fixed inertial system in which all sites are at rest with each other. 

In such a setting, light rays carry information at a constant speed c in Eu- 
clidean space. In terms of Einstein-Minkowski spacetime, the light rays 

^Our setting can bo thought of as consisting of a single inertial system, in which there 
is a single, non-relativistic, notion of time for all sites. 
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outgoing from an event (or a 4-dimensional point p) form a surface in space- 
time called the event's future light cone. The Hght rays converging on an 
event form a surface called the event's past hght cone. The spacetime points 
within p's future light cone make up its absolute future and those within its 
past hght cone make up its absolute past: the former are spacetime points 
that events at p can influence and the latter are the points can influence 
p. Events at points outside both light cones of p can neither influence nor 
be influenced by events at p. Such events are considered independent of, 
or sometimes called concurrent with events at p. Observe that the absolute 
future and absolute past cones of a point p are fixed and depend only on the 
coordinates of p. 

In analogy, consider a computer network based on a specific context 
where for every channel i ^ j, there is a fixed transmission time: miriij = 
maxij. Moreover, assume that the processes follow the full-information 
protocol fip in which, at every instant, they send a message describing their 
whole history to all neighbors. With fixed transmission rates and constant 
message sending, we would get that in every run 9 6' iS 9 — ■> 0'. Just as 
in the case of light traveling in Einstein-Minkowski spacetime, in this setting 
every node 9 would define a future cone fut(^) = {9'\9 9'} and a past 
cone past(^) = {9'\9' — ->■ 9}, as well as nodes that are causally concurrent 
with respect to 9. In this section we focus primarily upon future causality, 
where an intricate dynamics transforms potentiality into necessity, as we 
shall soon see. 

What happens when transmission times are not fixed? In purely asyn- 
chronous settings, where maxij = oo and thus messages can take arbitrarily 
long to be delivered, a node 9' can be influenced hy 9 = {i,t) only ii 9 ^ 9'. 
Thus, Lamport's relation defines a future cone (and a past cone) for every 
given node. As opposed to the fixed-transmission system described above, 
however, here the cone may differ significantly between different runs due 
to the varying transmission times. Figure 7.1 shows the future cone of node 
^ in a specific run, for an observer with complete information about the 
future. The alternative futures that remain unrealized in the current run 
are shown in outline. Observe that a "core" cone can be made out in the 
center of fut{9), of nodes that arc guaranteed a priori to be within fut{9), 
and will thus necessarily be affected by 9. We denote this cone by Oaff{9). 
In an asynchronous context, this core consists of the set of nodes {i,t') such 
that t' > t. 

The picture becomes more interesting in the presence of upper bounds 

maxij on message transmission times. Recall that we denote by Di^ the 
shortest distance between vertices i and h in the max-weighted network 
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Figure 7.1: The future causal cones of 6 in asynchronous systems and 7' 



graph. Under the fip described above, we are guaranteed to have {i,t) -» 
{h,t') whenever t' > t + Di^. Thus, maximal transmission times extend 
the inner cone into □aff(0) = {6'\9 9'}. As in the asynchronous case, 
for every run r € 7^(fip, 7*^^^) and node 6*, necessarily fut(0) 5 □aff((?), as 
messages that are delivered earlier than at the upper bounds on a chan- 
nel introduce into fut(^) nodes that were not guaranteed a priori to be in 
□afF(^). 

We may also consider the set □unaff(^), counterbalancing □aff(^), and 
consisting of nodes that are necessarily unaffected causally by = {i,t). 
As long as no lower bounds are defined, this set consists of all nodes in ^'s 
temporal past, as well the nodes {j,t) where j 7^ i? 

When we move to the contexts 7"^'" or based on 7*^, in which there 
are lower bounds on transmission times, □unaff(^) gets a richer structure. 
Lower bounds on transmission play a related, albeit somewhat different 
role than that of upper bounds. Suppose that a spontaneous event e takes 
place at 9 = {i,t) and that, based on the lower bounds, the fastest that 
communication from i can reach j is dij.^ If 9' = {j,t') where t' < t + dij, 
then events at 9' cannot be causally influenced by e. It follows that that the 
□unafF(^) region is now defined as the set {{j,t') : t' < t + dij}. Figure 7.2 
shows the causal cones of 9 in j^. 

We have considered the sets fut{9), □afF(6') and □unaff(0), which are all 
easily determined given complete information regarding the run's infinite 

^We assume that messages between processes are not instantaneous and spend at least 
one time unit in transmission. 

■^In analogy to the definition of the Dij values, dij is defined as the shortest distance 
between i and j in the mm-weighted network graph. 
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Figure 7.2: The future causal cones of 9 in 7*^ 



execution. To be of practical use however, we should consider whatever it is 
that can be made known about causal influence, given a specific "present" 
point in time t' and assuming that future events in the run are as yet un- 
determined. Wc define fut(6',t') as the set {{j,t")\e {j,t") and t" < t'}, 
the set of nodes that have, by time t' , already been realized as a part of ^'s 
future. 

The portion of the run realized by time t' determines the sets of necessar- 
ily affected and unaffected nodes relative to the current time, in a way that 
extends them beyond □aff(6') and □unaff(^), respectively. The set □afF(0, t') 
of all nodes that are guaranteed to be causally affected by 9 given fut(6', t'), 
is the union of the □aff(^') cones of all 9' G □afF(0,t'). As we already have 
9 {k,t), this suffices to ensure that {j,t") G fut{9). We denote with 
Ounaff(9, t') the set of nodes that are potentially unaffected by 9 relative to 
current time t'. This set is the complement of the set □aff(6', t'). 

A more challenging definition is that of the set Oaff (0, t') of nodes that, 
at time t' , are potentially affected by 9. A node 9' is potentially affected if it 
is possible, given fut{9,t'), that the current run will evolve so as to include 
9' in fut(6'). This set is inductively defined: If 6* = {i,t) then Oaff(6',t) = 
> t + dij}, and Oaff(0,t') = U(M-)eOaff(e,t'-i){(j, > * + 
dkj} for t' > t. The set of necessarily unaffected nodes □unafF(^,t') is the 
complement of Oaff(0,t'). 

Observe that with time, as larger portions of the run get realized, the set 
of nodes that are neither necessarily affected by 9 nor necessarily unaffected 
by it, given by Oaff(0, t') fl Ounaff(0, t'), monotonically shrinks. This can 
visualized by comparing the state of the cones in Figure 7.2 with that of 
Figure 7.3, that displays the same run at a later point in time. It is the case 
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that at time t' every node in the time interval [t, t'\ is either in □afF(^, i!) or 
in □unaff(0,t'). Moreover, the □aff(0,t') cone and □unaff(0,t') region grow 
monotonically grow with t' . 




Figure 7.3: The necessarily affected and unaffected regions by in 7°, w.r.t. 
time t' > t 

In summary, while light cones define fixed regions of influence and con- 
currency, communication dynamically determines the cones of influence and 
their complements. 

7.3 Transmission Guarantees and Knowledge of 
Ignorance 

Cones of influence and information flow as discussed in the previous section 
are clearly closely related to knowledge about knowledge and to knowledge 
about ignorance. In this section we build on the cones interpretation to 
analyze the dynamics of what would probably be best termed as "knowledge 
gain about ignorance". 

For the following analysis we introduce some variations in the formal 
language that is used in the proofs. In place of logical operators whose 
validity is dependent upon system, run and time (the {TZ, r, t) at the left hand 
side of the satisfies operator l=), this chapter utilizes time stamped operators 
that are dependent only upon a system and a specific run. 

The set $ of primitive propositions consists of the propositions occurredt(e) 
for all events e and times t, and the propositions 9 6' for all pairs of 
process-time nodes. The logical language C is obtained by closing $ under 
propositional connectives and knowledge formulas. We write Q \^ d' instead 
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of H> 9'). Our knowledge operators are indexed by a node 6 = {i,t), 
and so are time stamped. Thus, # C and ii (p E C, i £ F and t is a time, 
then K(^,i f)(p G C. The formula is read process i at time t knows ip. 

We write (R, r) t= 93 to state that (p holds in the run r, with respect to 
system R. We write r ~(i,t) r' whenever process z's local state at time t in r 
is identical to it's local state at time t in run r', and inductively define 

{R, r)\=e\^0' iS e^e' in the run r; 

{R,r) \= occurredt(e) iff the event e occurs in r by time t; and 

(i?, r) N K(^i^^^(p iff {R,r') N for every run r' satisfying r r'; 

Prepositional connectives are handled in the standard way, and their 
clauses are omitted above. Despite the slight variance in nomenclature, 
still follows [15] in being satisfied if if holds at all points at which i 
has the same local state as it does at time t. Thus, given R, the local state 
determines what processes know. Note that {R,r) \= occurredj(e) holds 
iff {R,r,t') 1= Attoccurred(e) for any time t'. Similarly, {R,r) \= iff 
(i?, r, t') \= AttKup for any time t' . So for the most part, this chapter's formal 
semantics is but an adaptation of those introduced in Chapter 1. 

The motivation here is twofold. First, the timestamped language allows 
us to simplify the presentation. Second, The use of timestamped epistemic 
operators and the introduction of the Lamport relation into the formal lan- 
guage implies greater expressivity that we hope will ferment new insights 
into the study of causation in distributed systems. 

Our analysis here will be performed within the context ^"^'"^ in which 
lower bounds on message transmission times are available. Recall that in 
7"^'" there are no upper bounds on message transmission times; messages can 
take an arbitrarily long amount of time to be delivered. In these settings. 
Lemma 10 tells us that nested knowledge implies a message chain linking 
the processes. The converse, shown below in Lemma 35, states that under 
fip, such a message chain implies nested knowledge. 

Lemma 35 Let TZ = 7^(fip, 7"^'"). Assume e is an ND event occurring at 
(io,t) inr. If there is a chain (iQ,t) -» ■ ■ ■ ^ {ikitk) in {r,t..t'), 

then (7^, r, t') N Ki^Ki^_^ ■ ■ ■ (occurred(e) A ND(e)). 

Proof The proof is arrived at by first noting that Lemma 14 implies the 
following for 7"^'": if (7^,r,^) 1= Ki(p and {i,t) {j,t') in r, then {n,r,t') \= 
Kj{AttKi(p). Then, repeated applications of this result give us the required 
outcome. ■ 
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In the rest of this section, we will focus on how different cones of influ- 
ence combine to determine when an process knows that another process is 
ignorant about an event of interest. We will give a complete characterization 
of this question for the fip and draw implications from this to the general 
case of arbitrary protocols. 

Recall the sealed-bid first-price auction described in the Introduction. 
Our bidder is named 12-, her competitor is ii, and the bids need to be 
in by time ti. Moreover, 12 must decide on her bid at time t-2- Finally, 
the event e in which information about a newly found gold mine was dis- 
closed occurred at = (^o,io)- The goal, then, is to determine whether 
i^6i2-'^eiOccurreclt„(e). 

Given that processes following fip have the perfect recall property, the 
following lemma shows that the knowledge state of a process determines its 
causal past. 

Lemma 36 Fix r G 7^(fip, 7™*") and nodes 9o, 61,62 such that zq 7^ ii. 
(7^, r) N Ke, ^ (^0, ^1) iff 9o^6i^ 62 in r. 



Proof 

^ If 6*0 > 01 then (7^,r) 1^ Kq,^ {60,61) by the Knowledge Axiom, 
immediately contradicting the lemma's assumptions. 

Suppose now that ^0 61 but that 9i y» 62- Since zq 7^ ii, agent 
i\ must receive a message at some point t'^ G {tQ,t\\. Since in -y"^*" 
every message receive is a nondeterministic event, using Lemma 10 we 
get that [TZ, r) Kq.^ ^ {6o,6\)- By definition of ^ there exists a run 
r' such that {Tl,r') 1= l=4> {60,61), and hence ^0 ^ past(r',-» 

, ^i), again leading to contradiction. 

^ (^O)^i) is an edge in paster, ^,62)- From perfect recall we get that 
(^O)^i) is an edge in past(r', 6*2) for all r ^^$2 By definition of 
past, 60 61 in every r', and therefore {Tl,r) 1= Kg^ \^ {60,61) . 



As discussed in Chapter 1, the meaning of the lower bounds miriij in 
the Net labeled graph component of the context is that certain message 
chains, in which messages travel faster than the lower bounds specify, are 
impossible. We say that a sequence of nodes ^ij • • • ; is a legal message 
chain with respect to Net if for every h < m we have (i) th < th+i and (ii) 
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if ih ih+i then ih h+i is a channel in Net, and {th+i - tu) > minj^j^^j . 
Clearly, for every legal message chain, there is a run of 7"^'" with network 
Net in which this message chain is rcahzcd, and 6q 61 ■ ■ ■ 9^. 
Conversely, if — » ^' in a run r, then there is a legal message chain starting 
at 6 and ending at 9' , that is a causal chain in r. 

By Lemma 10 we have that KQ^^KQ^ocQurr&6t^^{e) will hold if 62 knows 
that 9q ^ 61 in the current run. We now formalize the required conditions, 
based on causal cones and legal message chains. 

Definition 31 (Set of legal paths) We denote by Legal^/^^^ the set of le- 
gal message chains starting at Oq and ending at 61 . 

Legalgj^g^ consists of all message chains that are both within Oaff(0o) of 
nodes that can possibly be affected by 6*0, and in the analogous region of 
nodes that can possibly affect 9\. See Figure 7.4. 

Definition 32 (Cut) A O^Oi-cut is a set of nodes C that appear in the 

paths of Legalg^g^ that intersects every path in Legalg^^g^. 

The cut C is called ^o-clean in run r if Oq -/» c in r, for every c E C. 

Figure 7.4 depicts three different OqOi cuts. 




Figure 7.4: The set Legal^^jg^ of possible chains from ^0 to 61 

Lemma 36 tells us that in order for an process to be informed of any 
communication link between two distinct sites, the site on the receiving side 
must be in the process's past causal cone. The above discussion suggests 
that the existence of a clean cut, in this cone, on the paths between these 
sites is of importance. Moreover, we should be looking for cuts that are 
somehow more "recent" . The following definition picks up on this intuition. 
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Definition 33 (Causal front) Fix nodes 9o,9i,02- The causal front of 
9q9\ with respect to 62 in run r, denoted by FroiitQ2{r,9o,9i), is the set of 
nodes 

{<p\<p is on some chain in Legal^^g^ and 3^ G Legal^^^ s.t. *npast(^2) = {^}} 

Let * be a legal message chain connecting between ^0 and 9i that 
is also, at least in part, within the scope of past(^2)- By definition of 
Front^j (r, 9q, 9i), it will contain a "latest contact point" of 02 with the nodes 
of ^. So, as far as i2 knows at time t2, it is possible that ip ^ 9i. Now if it 
is also the case that 9o ^ ip, then a communication path between ^0 and 9i 
has been established. There is a certain subtlety involved in the definition. 
The fact that {i,t) is in Front6i2(r, ^1) does not mean that {i,t') is not 
in the front for t' > t. We can still have {i,t') G Front^ij 6*0, ^1) for some 
t' > t, if each of the nodes {i,t) and (^,^') constitutes a latest contact point 
for some potential path to 9i. 




Figure 7.5: Thick marking gives a schematic view of Front02(r, ^1) 

We are now ready to characterize knowledge of ignorance in -y'"'", by 
showing that it reduces to existence of a "^o-clean" cut in the causal front: 

Theorem 19 Let r G 7^(fip, 7""'") and denote F = FroTLte^{r,9o,9i). Then 
{Tl,r) t= Ke^{9Q ^ 9i) iff both (a) F is 9o-clean, and (b) F is a 9Q9i-cut. 

Proof 

=>: Suppose, to the contrary, that F is either not ^o-clean, or is not a ^0^1- 
cut. Choose a run r' eTZ such that 

• past(r',^2) = past(r, ^2), and where 
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• all messages sent and delivered outside past(r, 62) have minimal 
transmission times. 

That such a run exists is given by TZ being a representing system and 
by the non-dependence of nondeterministic events on the past of the 
run in which they occur. By Lemma 6 we get that r r' , and hence 
that (7^, r') \= 9o 61. Moreover, as past(r',^2) = past(r, ^2) we also 
have that Fronts' (r, ^2, ^0)^1 = P^oiLt02{r,9o,6i) = F. We now have 
two choices: 

F is not a ^o^i-cut: Then there exists 

such that n past(r',^2) = 0- By definition of r' wc get that 
ipo ipi--- '0n in r', and hence that (7^, r') N Oq 0i, 
contradiction. 

F is not ^o-clean: In this case there exists 

* = (00 = V'o ^1 • • • = 0i) e Legalg^e^ 

and some k < n such that ipo tpk and (V'fc+i • • • V'n)npast(r', 62) = 
0. Again by definition of r' we get that V'fe+i ^ • • • ^ V'n in r'. 
We obtain that iJjq ipn and hence that {7l,r') \^ OqP^ 9i, again 
contradicting the assumption. 



<^=: Suppose that {Tl,r) \^ Kg^Oo ^ ^i- Then there exists a run r' such 
that r r', where {TZ,r') \= 6q 61. Let = (^/^Q) "01; ••! V'n) be a 
sequence such that 9q = ipQ V'l " " " ^'^^ V'n = in r' . 

It foUows that ^ G Legalg^^j. Since F is a ^o^i-cut, there must exist 
some ^ e FCi'^. Since ^ G * we get that G past(r', cp). Since F is a 
62 causal front we have that ^ G past(r, ^2), and as r' and -y"*'" 

is causally traced we also obtain that past{r',(j)) = past(r, 0). This 
gives us that 9q ^ in r too, contradicting the assumption that F is 
^o-clean in r. 

'^Theorem 19 

Theorem 19 characterizes knowledge of non-causality under fip in a sys- 
tem with lower bounds on transmission times. Based on the knowledge gain 
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theorem, we can translate this into conditions on when an process will know 
that another process is ignorant of the occurrence of an event of interest. 
Consider an event eq that can occur only at ig- We are interested in when 
ivTgj-'i^eiOCCurredt^ (eo) holds. Clearly, if 12 knows that eo did not take place, 
then it would know that ii does not know that eo took place. 

Theorem 19 provides a condition enabling knowledge at 62 that Oq 9i. 
Suppose that ^0 = (^Oi^o)- Since is transitive, however, ^0 A ^1 implies 
that 9' y» 61 for all 9' = (io, t') with t' > to- So, by Lemma 10, 9i could not 
have knowledge that eo happened at any time after to too. Combining these 
observations, we are able to obtain a tight characterization of knowledge 
about ignorance regarding the occurrence of a nondeterministic event: 

Theorem 20 (Knowledge of Ignorance Theorem) Letr G 7^(fip, 7"^'"), 
fix a node O2, and let eo be a nondeterministic i^-event. Let t' be the latest 
time for which (7?., r) N KQ^^occurredf (cq) holds, and denote Oq = {io,t'+l). 

Then 

{TZ,r) t= i^02~'-^6'ioccurredt^ (eo) iff FxontQ^{r,9o,9i) is a 9o-clean, 9q9i- 
cut. 

Proof 

=>: We will prove the counter-position. Suppose that F is not a ^o^i-cut or 
it is not is ^o-clean. Note that in particular, this means that ^o ^1 
and hence that to ^ ti- As to is the latest time for which {TZ,r) 1= 
Ke2~'0ccurredtg_i(eo) holds, there must exist a run r' r where eo 
occurs at {io, to)- As 7"^*" is causally traced we get that past(r', ^2) = 
past(r, 92) and hence that Fronts' (r, 92, 9o)9i = Front6i2(?', ^Oj ^1) = 
Theorem 19 now shows that {TZ, r') )^ Kg,^9o ^ 9i. So there must ex- 
ist a run r" G IZ such that r" r' , where {Tl,r") \= 9o 9i. As 
the processes are following fip we get, using Lemma 35, that {Tl,r') 1= 
if5)^occurredtQ(eo). Since to < ti we get {TZ,r') != i^^ijoccurredtj (eo). Fi- 
nally, since r" '''' ~02 we get that (7^, r) Ke^^K^^occurredt^ (eo), 
contradicting our assumptions. 

Wc will prove the counter-position. Suppose that F is not a ^o^i-cut or 
it is not is ^o-clean. Note that in particular, this means that ^0 ^1 
and hence that to < ti. As to is the latest time for which {lZ,r) 1= 
ife2-ioccurredtQ_i(eo) holds, there must exist a run r' where eo 

occurs at (jo,to). As 7™"- is causally traced we get that past(r',02) = 
past(r, ^2) and hence that Frontr'(r, ^2, 6*0)^1 = Frontgj (?", ^o, 6*1) = F. 
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Theorem 19 now shows that {Tl,r') )/^ Kq^Qq 17^ ^i- So there must ex- 
ist a run r" G Tt such that r" ^q,^ r', where {TZ,r") 1= ^ ^i- As 
the processes are following fip we get, using Lemma 35, that {TZ,r') \= 
i^g^occurreci4Q(eo). Sincere < we get {TZ,r') 1= ivr^joccurredtj (cq). Fi- 
nally, since r" S^* that {TZ, r) KQ^^Ko^occurredt^ (eo), 
contradicting our assumptions. 

Choose an arbitrary r' e TZ such that r '"'^ We consider three 
options for the occurrence of event cq: 

• Co does not occur in run r': in this case we get, in particular, that 
{Tl,r') N -iKg^occurredtj(eo). 

• eo occurs before time to- in this case wc obtain a contradiction 
to the theorem's assumption that (7^, r) N ifg2-ioccurredto_i(eo). 

• Co occurs at some time t' > to: from r r^g^ r' and Lemma 36 
we get that Fronts' (r, 6*2, 6'o)6'i = Front6i2(r, 6*0, ^i) = F. The- 
orem 19 is now used to show that {TZ,r') 1= Kq^Oo ^ ^ij and 
thus that {TZ,r') \= 60 9i. By definition of -» we also get 
that {io,t') 7^ 6*1. Using Lemma 10 we conclude that (7^, r') N 
-.i^0^occurredti(eo). 

We showed that {TZ,r') N -li^^QjOCCurredt^ (eo) for all r '^g^ r' . By 
definition of N we get that (7^, r) N i^^/j-iiCg^occurredtj (eo), as required. 

^Theorem 20 



7.4 Conclusions 

While in timing-based algorithms such as clock-synchronization algorithms 
[22, 1] lower bounds on transmission times are typically of limited impact, 
our thesis in the current chapter is that lower bounds play a crucial role 
in determining knowledge about ignorance. This, in turn, can be of value 
in player's considerations in non-cooperative settings. In this chapter we 
characterized when knowledge about ignorance is obtained in runs of the 
full-information protocol, in the presence of lower bounds. A natural ques- 
tion involves characterizing knowledge of ignorance for general protocols, 
or in strategic settings in which a player has uncertainty concerning other 
players' strategies. Our results have natural implications about more general 
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settings: if Alice knows that even under the full-information protocol Bob 
cannot know about Charlie, then she may be able to conclude the same even 
under lesser communication. But the analysis required for the general ques- 
tion is more subtle, since Alice could hear from intermediate points without 
having full knowledge of what information they have. There is considerable 
room for further exploration of this point. 

This chapter also draws an analogy between the causal cones that are 
formed by information in synchronous systems with bounds, and the notion 
of causal light-cones in physics. The invariance of the speed of light causes 
the causal cone of a given point in 4-dimensional Einstein-Minkowski space- 
time to be fixed a priori and not change as time proceeds. In contrast, in 
the digital space of communication networks, upper bounds induce a region 
of points that are definitely affected by a spontaneous event occurring at 
a given point, while lower bounds define a region of points guaranteed to 
not be affected. For a given point 6 = {i,t), these regions grow with time, 
converging at the end of time to form the set of points actually affected 
by 6. We used this view to motivate our analysis of knowledge of ignorance. 
We believe that further study of the causal cones and their evolution over 
time will provide insights into the fundamental properties of synchronous 
environments. 
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Chapter 8 

Discussion 



This thesis investigates causahty and coordination in distributed systems. It 
extends Lamport's original paper on causality [26] by looking into causality 
in synchronous systems. As we saw in Chapter 3, our results provide a 
generahzation of Lamport's work, in the sense that asynchronous networks 
are modeled as systems with infinite upper transmission bounds. 

Our results show that while the dissemination pattern of asynchronous 
causality is a fairly straightforward extension of the happened-before rela- 
tion into message chains, synchronous causality spreads in rather complex 
patterns that combine message deliveries and timing guarantees. 

Formally, our study is based on knowledge-based analysis [15]. As it 
turns out, the notion of knowledge provides a very close formal approxi- 
mation of causal influence, which also underlies temporal precedence. In 
fact, the various forms of temporal event orderings that we examine are 
each reduced to corresponding epistemic conditions. The two basic kinds 
being linear ordering, extended to nested knowledge, and simultaneous or- 
dering, which is reduced to a common knowledge requirement. In Chapter 6 
these two basic ordering types are combined, providing a characterization 
of the causal pattern requirement for any given partial ordering on events. 
Chapter 7 introduces a complementing approach, studying the minimal re- 
quirements needed for a process to an ensure that an event at a remote site 
has not yet transpired. 

The current study opens up many possible venues for further research. 
An immediate concern would be to further extend the study of causality 
into weaker systems, such as systems with clock drifts, or systems where 
the processes have only partial knowledge of the bounds on communication. 
In such networks two learning dynamics intertwine. First, the dynamics 
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covered in this thesis, that explain how information about events in the 
current run are spread in the system. A second process of information flow 
concerns the gradual learning of the processes about the communication 
characteristics of the underlying network. Relevant information to be gained 
here for a process is not only what the actual bounds are, but also what other 
processes may have learned about these bounds. 

More generally, the model suggested here makes pretty strong synchrony 
assumptions. Weakening the model by removing the global clock, reducing 
the available knowledge about network characteristics, and allowing failures, 
is necessary in order to bring our results closer to real world applications. 

Another salient extension to system characteristics would be to consider 
mobile networks. Here, given the graveness of energy considerations and the 
multi-hop nature of communication, our characterizations of minimal com- 
munication requirements may be highly relevant. In such systems though, 
we expect not only the bounds to vary throughout a run, but also the num- 
ber and identity of participating processes. Some causal analyses have been 
suggested [45, 4], but they follow the classical asynchronous paradigm. 

Another direction for further investigation would be to consider the finer- 
grained patterns that arc introduced if we consider protocol-specific knowl- 
edge. The thesis follows the approach of Chandy and Misra [7], in charting 
out the communication that is universally necessary for knowledge gain. It 
is clear that once a specific protocol is considered, the set of communication 
patterns that lead to knowledge gain is reduced. First, it may be that pro- 
cess i does not gain any information regarding an occurrence at j despite 
the existence of a causal connection, simply because the messages relating 
the two processes do not convey this fact (a typical case is a NULL mes- 
sage, that carries very limited information content). Moreover, processes 
may delay message relaying, or may be committed to a specific communica- 
tion channel despite the existence of several alternatives. A protocol-specific 
characterization of knowledge gain, and hence of temporal ordering, would 
take all such protocol-specified limitations on communication into account. 
We would expect tighter necessity conditions for knowledge gain here. 

Lamport's definition for the concurrency of two events e and e' in [26] is 
that e -/» e' and e' -/» e. Further inquiry about concurrency in synchronous 
systems is also in place. Of course, given the global clock, the obvious 
candidate for concurrency is that both e and e' occur at exactly the same 
time. However, a more subtle approach may be appropriate here. Despite 
the possibility of actually confirming the exact time of occurrence of events, 
in many cases the concurrency of two events is actually accidental. What 
we are really after is a notion of "temporal independence": that events e 
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and e' may occur at any temporal ordering in relation to each other. Such 
an inquiry may serve to extend the measure of parallelism in a protocol by 
identifying and uncoupling simultaneous occurrences that should really be 
temporally independent. 

Several possible extensions of the thesis may be of less relevance to the 
distributed systems community, referring instead to the study of multi-agent 
systems under other disciplines. As we mentioned in Chapter 7, one such 
case is the study of "knowledge of ignorance" . For one process to be able 
to tell that another process is unaware of a certain occurrence seems to be 
highly relevant for systems which are of a competitive nature, as studied in 
game theory. The existing results presented here are by and large incom- 
plete, as we do not know what more complex epistemic states involving igno- 
rance, such as Ki^^Ki.^^Ki^{occurred{eo) A ND(eo)), would require in terms 
of (dis)communication. Another possible research direction is the relaxation 
of knowledge into belief. Formal systems involving belief arc ubiquitous in 
game theory and also in the general study of multi-agent systems [3, 49]. 
Do the causal patterns that characterize knowledge gain also characterize 
the spread of belief? We do not currently know. 

Finally, our results as presented give a "snapshot" of the communication 
pattern that must exist, if knowledge has been gained. They insinuate that 
the dynamics of information flow is such that information moves "outward" 
from the site of occurrence of an ND event. Yet we do not prove that 
such is the case. We have began to extend our results in this direction too. 
Here some subtle considerations must be made in order to accommodate 
information "flow". Also, as it turns out, one must consider the possibility 
that certain information, say fact (p, is actually spread as separate packets, 
each containing a fraction of the required information. 

All in all, we believe (and hope) that synchronous causality, along with 
the new concepts and methodology presented in this thesis, will turn out to 
be a fruitful advancement in the field of distributed computing, as well as 
for multi-agent systems in general. 
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